From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.9 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI,NICE_REPLY_A, SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS,USER_AGENT_SANE_1 autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 15096C43461 for ; Tue, 8 Sep 2020 05:06:44 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 8F6402166E for ; Tue, 8 Sep 2020 05:06:43 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 8F6402166E Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=csgroup.eu Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id A3AB76B0002; Tue, 8 Sep 2020 01:06:42 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 9C2E46B0037; Tue, 8 Sep 2020 01:06:42 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 862478E0001; Tue, 8 Sep 2020 01:06:42 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0223.hostedemail.com [216.40.44.223]) by kanga.kvack.org (Postfix) with ESMTP id 6A9B46B0002 for ; Tue, 8 Sep 2020 01:06:42 -0400 (EDT) Received: from smtpin24.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id 27855180AD811 for ; Tue, 8 Sep 2020 05:06:42 +0000 (UTC) X-FDA: 77238709044.24.touch00_280aa0d270d2 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin24.hostedemail.com (Postfix) with ESMTP id 024DD1A4A0 for ; Tue, 8 Sep 2020 05:06:41 +0000 (UTC) X-HE-Tag: touch00_280aa0d270d2 X-Filterd-Recvd-Size: 10550 Received: from pegase1.c-s.fr (pegase1.c-s.fr [93.17.236.30]) by imf01.hostedemail.com (Postfix) with ESMTP for ; Tue, 8 Sep 2020 05:06:40 +0000 (UTC) Received: from localhost (mailhub1-int [192.168.12.234]) by localhost (Postfix) with ESMTP id 4BltRK1g9Yz9tyWb; Tue, 8 Sep 2020 07:06:37 +0200 (CEST) X-Virus-Scanned: Debian amavisd-new at c-s.fr Received: from pegase1.c-s.fr ([192.168.12.234]) by localhost (pegase1.c-s.fr [192.168.12.234]) (amavisd-new, port 10024) with ESMTP id NDyhNxhMFPxL; Tue, 8 Sep 2020 07:06:37 +0200 (CEST) Received: from messagerie.si.c-s.fr (messagerie.si.c-s.fr [192.168.25.192]) by pegase1.c-s.fr (Postfix) with ESMTP id 4BltRG5mjlz9tyWZ; Tue, 8 Sep 2020 07:06:34 +0200 (CEST) Received: from localhost (localhost [127.0.0.1]) by messagerie.si.c-s.fr (Postfix) with ESMTP id 7661B8B78B; Tue, 8 Sep 2020 07:06:35 +0200 (CEST) X-Virus-Scanned: amavisd-new at c-s.fr Received: from messagerie.si.c-s.fr ([127.0.0.1]) by localhost (messagerie.si.c-s.fr [127.0.0.1]) (amavisd-new, port 10023) with ESMTP id HWiz5tBjyEqc; Tue, 8 Sep 2020 07:06:35 +0200 (CEST) Received: from [192.168.4.90] (unknown [192.168.4.90]) by messagerie.si.c-s.fr (Postfix) with ESMTP id 3409E8B768; Tue, 8 Sep 2020 07:06:31 +0200 (CEST) Subject: Re: [RFC PATCH v2 1/3] mm/gup: fix gup_fast with dynamic page table folding To: Gerald Schaefer , Jason Gunthorpe , John Hubbard Cc: Peter Zijlstra , Dave Hansen , linux-mm , Paul Mackerras , linux-sparc , Alexander Gordeev , Claudio Imbrenda , Will Deacon , linux-arch , linux-s390 , Vasily Gorbik , Richard Weinberger , linux-x86 , Russell King , Christian Borntraeger , Ingo Molnar , Catalin Marinas , Andrey Ryabinin , Heiko Carstens , Arnd Bergmann , Jeff Dike , linux-um , Borislav Petkov , Andy Lutomirski , Thomas Gleixner , linux-arm , linux-power , LKML , Andrew Morton , Linus Torvalds , Mike Rapoport References: <20200907180058.64880-1-gerald.schaefer@linux.ibm.com> <20200907180058.64880-2-gerald.schaefer@linux.ibm.com> From: Christophe Leroy Message-ID: <82fbe8f9-f199-5fc2-4168-eb43ad0b0346@csgroup.eu> Date: Tue, 8 Sep 2020 07:06:23 +0200 User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:68.0) Gecko/20100101 Thunderbird/68.12.0 MIME-Version: 1.0 In-Reply-To: <20200907180058.64880-2-gerald.schaefer@linux.ibm.com> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: fr X-Rspamd-Queue-Id: 024DD1A4A0 X-Spamd-Result: default: False [0.00 / 100.00] X-Rspamd-Server: rspam03 Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Le 07/09/2020 =C3=A0 20:00, Gerald Schaefer a =C3=A9crit=C2=A0: > From: Alexander Gordeev >=20 > Commit 1a42010cdc26 ("s390/mm: convert to the generic get_user_pages_fa= st > code") introduced a subtle but severe bug on s390 with gup_fast, due to > dynamic page table folding. >=20 > The question "What would it require for the generic code to work for s3= 90" > has already been discussed here > https://lkml.kernel.org/r/20190418100218.0a4afd51@mschwideX1 > and ended with a promising approach here > https://lkml.kernel.org/r/20190419153307.4f2911b5@mschwideX1 > which in the end unfortunately didn't quite work completely. >=20 > We tried to mimic static level folding by changing pgd_offset to always > calculate top level page table offset, and do nothing in folded pXd_off= set. > What has been overlooked is that PxD_SIZE/MASK and thus pXd_addr_end do > not reflect this dynamic behaviour, and still act like static 5-level > page tables. >=20 [...] >=20 > Fix this by introducing new pXd_addr_end_folded helpers, which take an > additional pXd entry value parameter, that can be used on s390 > to determine the correct page table level and return corresponding > end / boundary. With that, the pointer iteration will always > happen in gup_pgd_range for s390. No change for other architectures > introduced. Not sure pXd_addr_end_folded() is the best understandable name,=20 allthough I don't have any alternative suggestion at the moment. Maybe could be something like pXd_addr_end_fixup() as it will disappear=20 in the next patch, or pXd_addr_end_gup() ? Also, if it happens to be acceptable to get patch 2 in stable, I think=20 you should switch patch 1 and patch 2 to avoid the step through=20 pXd_addr_end_folded() >=20 > Fixes: 1a42010cdc26 ("s390/mm: convert to the generic get_user_pages_fa= st code") > Cc: # 5.2+ > Reviewed-by: Gerald Schaefer > Signed-off-by: Alexander Gordeev > Signed-off-by: Gerald Schaefer > --- > arch/s390/include/asm/pgtable.h | 42 ++++++++++++++++++++++++++++++++= + > include/linux/pgtable.h | 16 +++++++++++++ > mm/gup.c | 8 +++---- > 3 files changed, 62 insertions(+), 4 deletions(-) >=20 > diff --git a/arch/s390/include/asm/pgtable.h b/arch/s390/include/asm/pg= table.h > index 7eb01a5459cd..027206e4959d 100644 > --- a/arch/s390/include/asm/pgtable.h > +++ b/arch/s390/include/asm/pgtable.h > @@ -512,6 +512,48 @@ static inline bool mm_pmd_folded(struct mm_struct = *mm) > } > #define mm_pmd_folded(mm) mm_pmd_folded(mm) > =20 > +/* > + * With dynamic page table levels on s390, the static pXd_addr_end() f= unctions > + * will not return corresponding dynamic boundaries. This is no proble= m as long > + * as only pXd pointers are passed down during page table walk, becaus= e > + * pXd_offset() will simply return the given pointer for folded levels= , and the > + * pointer iteration over a range simply happens at the correct page t= able > + * level. > + * It is however a problem with gup_fast, or other places walking the = page > + * tables w/o locks using READ_ONCE(), and passing down the pXd values= instead > + * of pointers. In this case, the pointer given to pXd_offset() is a p= ointer to > + * a stack variable, which cannot be used for pointer iteration at the= correct > + * level. Instead, the iteration then has to happen by going up to pgd= level > + * again. To allow this, provide pXd_addr_end_folded() functions with = an > + * additional pXd value parameter, which can be used on s390 to determ= ine the > + * folding level and return the corresponding boundary. > + */ > +static inline unsigned long rste_addr_end_folded(unsigned long rste, u= nsigned long addr, unsigned long end) What does 'rste' stands for ? Isn't this line a bit long ? > +{ > + unsigned long type =3D (rste & _REGION_ENTRY_TYPE_MASK) >> 2; > + unsigned long size =3D 1UL << (_SEGMENT_SHIFT + type * 11); > + unsigned long boundary =3D (addr + size) & ~(size - 1); > + > + /* > + * FIXME The below check is for internal testing only, to be removed > + */ > + VM_BUG_ON(type < (_REGION_ENTRY_TYPE_R3 >> 2)); > + > + return (boundary - 1) < (end - 1) ? boundary : end; > +} > + > +#define pgd_addr_end_folded pgd_addr_end_folded > +static inline unsigned long pgd_addr_end_folded(pgd_t pgd, unsigned lo= ng addr, unsigned long end) > +{ > + return rste_addr_end_folded(pgd_val(pgd), addr, end); > +} > + > +#define p4d_addr_end_folded p4d_addr_end_folded > +static inline unsigned long p4d_addr_end_folded(p4d_t p4d, unsigned lo= ng addr, unsigned long end) > +{ > + return rste_addr_end_folded(p4d_val(p4d), addr, end); > +} > + > static inline int mm_has_pgste(struct mm_struct *mm) > { > #ifdef CONFIG_PGSTE > diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h > index e8cbc2e795d5..981c4c2a31fe 100644 > --- a/include/linux/pgtable.h > +++ b/include/linux/pgtable.h > @@ -681,6 +681,22 @@ static inline int arch_unmap_one(struct mm_struct = *mm, > }) > #endif > =20 > +#ifndef pgd_addr_end_folded > +#define pgd_addr_end_folded(pgd, addr, end) pgd_addr_end(addr, end) > +#endif > + > +#ifndef p4d_addr_end_folded > +#define p4d_addr_end_folded(p4d, addr, end) p4d_addr_end(addr, end) > +#endif > + > +#ifndef pud_addr_end_folded > +#define pud_addr_end_folded(pud, addr, end) pud_addr_end(addr, end) > +#endif > + > +#ifndef pmd_addr_end_folded > +#define pmd_addr_end_folded(pmd, addr, end) pmd_addr_end(addr, end) > +#endif > + > /* > * When walking page tables, we usually want to skip any p?d_none ent= ries; > * and any p?d_bad entries - reporting the error before resetting to = none. > diff --git a/mm/gup.c b/mm/gup.c > index bd883a112724..ba4aace5d0f4 100644 > --- a/mm/gup.c > +++ b/mm/gup.c > @@ -2521,7 +2521,7 @@ static int gup_pmd_range(pud_t pud, unsigned long= addr, unsigned long end, > do { > pmd_t pmd =3D READ_ONCE(*pmdp); > =20 > - next =3D pmd_addr_end(addr, end); > + next =3D pmd_addr_end_folded(pmd, addr, end); > if (!pmd_present(pmd)) > return 0; > =20 > @@ -2564,7 +2564,7 @@ static int gup_pud_range(p4d_t p4d, unsigned long= addr, unsigned long end, > do { > pud_t pud =3D READ_ONCE(*pudp); > =20 > - next =3D pud_addr_end(addr, end); > + next =3D pud_addr_end_folded(pud, addr, end); > if (unlikely(!pud_present(pud))) > return 0; > if (unlikely(pud_huge(pud))) { > @@ -2592,7 +2592,7 @@ static int gup_p4d_range(pgd_t pgd, unsigned long= addr, unsigned long end, > do { > p4d_t p4d =3D READ_ONCE(*p4dp); > =20 > - next =3D p4d_addr_end(addr, end); > + next =3D p4d_addr_end_folded(p4d, addr, end); > if (p4d_none(p4d)) > return 0; > BUILD_BUG_ON(p4d_huge(p4d)); > @@ -2617,7 +2617,7 @@ static void gup_pgd_range(unsigned long addr, uns= igned long end, > do { > pgd_t pgd =3D READ_ONCE(*pgdp); > =20 > - next =3D pgd_addr_end(addr, end); > + next =3D pgd_addr_end_folded(pgd, addr, end); > if (pgd_none(pgd)) > return; > if (unlikely(pgd_huge(pgd))) { >=20 Christophe