From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-8.1 required=3.0 tests=DKIM_INVALID,DKIM_SIGNED, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY, SPF_HELO_NONE,SPF_PASS,USER_AGENT_SANE_1 autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 24CE3C352A2 for ; Thu, 6 Feb 2020 05:46:32 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id B09B0214AF for ; Thu, 6 Feb 2020 05:46:31 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (1024-bit key) header.d=c-s.fr header.i=@c-s.fr header.b="h8SXhH9C" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org B09B0214AF Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=c-s.fr Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 5FC5A6B0003; Thu, 6 Feb 2020 00:46:31 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 585FB6B0006; Thu, 6 Feb 2020 00:46:31 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 426806B0007; Thu, 6 Feb 2020 00:46:31 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0097.hostedemail.com [216.40.44.97]) by kanga.kvack.org (Postfix) with ESMTP id 2353B6B0003 for ; Thu, 6 Feb 2020 00:46:31 -0500 (EST) Received: from smtpin28.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id C5354181AEF07 for ; Thu, 6 Feb 2020 05:46:30 +0000 (UTC) X-FDA: 76458617340.28.women66_901207a1c9d07 X-HE-Tag: women66_901207a1c9d07 X-Filterd-Recvd-Size: 9794 Received: from pegase1.c-s.fr (pegase1.c-s.fr [93.17.236.30]) by imf34.hostedemail.com (Postfix) with ESMTP for ; Thu, 6 Feb 2020 05:46:29 +0000 (UTC) Received: from localhost (mailhub1-int [192.168.12.234]) by localhost (Postfix) with ESMTP id 48CnVW2zD1z9v024; Thu, 6 Feb 2020 06:46:27 +0100 (CET) Authentication-Results: localhost; dkim=pass reason="1024-bit key; insecure key" header.d=c-s.fr header.i=@c-s.fr header.b=h8SXhH9C; dkim-adsp=pass; dkim-atps=neutral X-Virus-Scanned: Debian amavisd-new at c-s.fr Received: from pegase1.c-s.fr ([192.168.12.234]) by localhost (pegase1.c-s.fr [192.168.12.234]) (amavisd-new, port 10024) with ESMTP id 6m1t9Kik8HIJ; Thu, 6 Feb 2020 06:46:27 +0100 (CET) Received: from messagerie.si.c-s.fr (messagerie.si.c-s.fr [192.168.25.192]) by pegase1.c-s.fr (Postfix) with ESMTP id 48CnVW1P9yz9v01y; Thu, 6 Feb 2020 06:46:27 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=c-s.fr; s=mail; t=1580967987; bh=EEYv+zTzd9qo5V5Yah42Eq7K5Bol2cKDrKt1OC5EqeA=; h=Subject:To:Cc:References:From:Date:In-Reply-To:From; b=h8SXhH9CINBhzxoawEfiaQhcpGYYlkCHb5JEs48IVhBekiJ64SiIpqe6fXiA3rqln WAoKLqvJ1kBesxZn6PVqsffW9G05yXC4JbwiQZf52Ec5mt6qzFhf12fhUeD5nyKrUy qSm9Tle02ACObMn+dv64Har56iHXDlphw+4s1C00= Received: from localhost (localhost [127.0.0.1]) by messagerie.si.c-s.fr (Postfix) with ESMTP id 271E68B787; Thu, 6 Feb 2020 06:46:27 +0100 (CET) X-Virus-Scanned: amavisd-new at c-s.fr Received: from messagerie.si.c-s.fr ([127.0.0.1]) by localhost (messagerie.si.c-s.fr [127.0.0.1]) (amavisd-new, port 10023) with ESMTP id qN1rH3Apcu68; Thu, 6 Feb 2020 06:46:27 +0100 (CET) Received: from [192.168.4.90] (unknown [192.168.4.90]) by messagerie.si.c-s.fr (Postfix) with ESMTP id 04F048B776; Thu, 6 Feb 2020 06:46:18 +0100 (CET) Subject: Re: [PATCH v6 03/11] powerpc/mm: Adds arch-specificic functions to track lockless pgtable walks To: Leonardo Bras , Benjamin Herrenschmidt , Paul Mackerras , Michael Ellerman , Arnd Bergmann , Andrew Morton , "Aneesh Kumar K.V" , Nicholas Piggin , Steven Price , Robin Murphy , Mahesh Salgaonkar , Balbir Singh , Reza Arbab , Thomas Gleixner , Allison Randal , Greg Kroah-Hartman , Mike Rapoport , Michal Suchanek Cc: linuxppc-dev@lists.ozlabs.org, linux-kernel@vger.kernel.org, kvm-ppc@vger.kernel.org, linux-arch@vger.kernel.org, linux-mm@kvack.org References: <20200206030900.147032-1-leonardo@linux.ibm.com> <20200206030900.147032-4-leonardo@linux.ibm.com> From: Christophe Leroy Message-ID: <1311ce1c-7e5a-f7c4-2ab2-c03e124ca1c1@c-s.fr> Date: Thu, 6 Feb 2020 06:46:18 +0100 User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:68.0) Gecko/20100101 Thunderbird/68.4.2 MIME-Version: 1.0 In-Reply-To: <20200206030900.147032-4-leonardo@linux.ibm.com> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: fr Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Le 06/02/2020 =C3=A0 04:08, Leonardo Bras a =C3=A9crit=C2=A0: > On powerpc, we need to do some lockless pagetable walks from functions > that already have disabled interrupts, specially from real mode with > MSR[EE=3D0]. >=20 > In these contexts, disabling/enabling interrupts can be very troubling. When interrupts are already disabled, the flag returned when disabling=20 it will be such that when we restore it later, interrupts remain=20 disabled, so what's the problem ? >=20 > So, this arch-specific implementation features functions with an extra > argument that allows interrupt enable/disable to be skipped: > __begin_lockless_pgtbl_walk() and __end_lockless_pgtbl_walk(). >=20 > Functions similar to the generic ones are also exported, by calling > the above functions with parameter {en,dis}able_irq =3D true. >=20 > Signed-off-by: Leonardo Bras > --- > arch/powerpc/include/asm/book3s/64/pgtable.h | 6 ++ > arch/powerpc/mm/book3s64/pgtable.c | 86 +++++++++++++++++++= - > 2 files changed, 91 insertions(+), 1 deletion(-) >=20 > diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h b/arch/powerp= c/include/asm/book3s/64/pgtable.h > index 201a69e6a355..78f6ffb1bb3e 100644 > --- a/arch/powerpc/include/asm/book3s/64/pgtable.h > +++ b/arch/powerpc/include/asm/book3s/64/pgtable.h > @@ -1375,5 +1375,11 @@ static inline bool pgd_is_leaf(pgd_t pgd) > return !!(pgd_raw(pgd) & cpu_to_be64(_PAGE_PTE)); > } > =20 > +#define __HAVE_ARCH_LOCKLESS_PGTBL_WALK_CONTROL > +unsigned long begin_lockless_pgtbl_walk(void); > +unsigned long __begin_lockless_pgtbl_walk(bool disable_irq); > +void end_lockless_pgtbl_walk(unsigned long irq_mask); > +void __end_lockless_pgtbl_walk(unsigned long irq_mask, bool enable_irq= ); > + Why not make them static inline just like the generic ones ? > #endif /* __ASSEMBLY__ */ > #endif /* _ASM_POWERPC_BOOK3S_64_PGTABLE_H_ */ > diff --git a/arch/powerpc/mm/book3s64/pgtable.c b/arch/powerpc/mm/book3= s64/pgtable.c > index 2bf7e1b4fd82..535613030363 100644 > --- a/arch/powerpc/mm/book3s64/pgtable.c > +++ b/arch/powerpc/mm/book3s64/pgtable.c > @@ -82,6 +82,7 @@ static void do_nothing(void *unused) > { > =20 > } > + Is this blank line related to the patch ? > /* > * Serialize against find_current_mm_pte which does lock-less > * lookup in page tables with local interrupts disabled. For huge pag= es > @@ -98,6 +99,89 @@ void serialize_against_pte_lookup(struct mm_struct *= mm) > smp_call_function_many(mm_cpumask(mm), do_nothing, NULL, 1); > } > =20 > +/* begin_lockless_pgtbl_walk: Must be inserted before a function call = that does > + * lockless pagetable walks, such as __find_linux_pte(). > + * This version allows setting disable_irq=3Dfalse, so irqs are not to= uched, which > + * is quite useful for running when ints are already disabled (like = real-mode) > + */ > +inline > +unsigned long __begin_lockless_pgtbl_walk(bool disable_irq) > +{ > + unsigned long irq_mask =3D 0; > + > + /* > + * Interrupts must be disabled during the lockless page table walk. > + * That's because the deleting or splitting involves flushing TLBs, > + * which in turn issues interrupts, that will block when disabled. > + * > + * When this function is called from realmode with MSR[EE=3D0], > + * it's not needed to touch irq, since it's already disabled. > + */ > + if (disable_irq) > + local_irq_save(irq_mask); > + > + /* > + * This memory barrier pairs with any code that is either trying to > + * delete page tables, or split huge pages. Without this barrier, > + * the page tables could be read speculatively outside of interrupt > + * disabling or reference counting. > + */ > + smp_mb(); > + > + return irq_mask; > +} > +EXPORT_SYMBOL(__begin_lockless_pgtbl_walk); > + > +/* begin_lockless_pgtbl_walk: Must be inserted before a function call = that does > + * lockless pagetable walks, such as __find_linux_pte(). > + * This version is used by generic code, and always assume irqs will b= e disabled > + */ > +unsigned long begin_lockless_pgtbl_walk(void) > +{ > + return __begin_lockless_pgtbl_walk(true); > +} > +EXPORT_SYMBOL(begin_lockless_pgtbl_walk); Even more than begin_lockless_pgtbl_walk(), this one is worth being=20 static inline in the H file. > + > +/* > + * __end_lockless_pgtbl_walk: Must be inserted after the last use of a= pointer > + * returned by a lockless pagetable walk, such as __find_linux_pte() > + * This version allows setting enable_irq=3Dfalse, so irqs are not tou= ched, which > + * is quite useful for running when ints are already disabled (like = real-mode) > + */ > +inline void __end_lockless_pgtbl_walk(unsigned long irq_mask, bool ena= ble_irq) > +{ > + /* > + * This memory barrier pairs with any code that is either trying to > + * delete page tables, or split huge pages. Without this barrier, > + * the page tables could be read speculatively outside of interrupt > + * disabling or reference counting. > + */ > + smp_mb(); > + > + /* > + * Interrupts must be disabled during the lockless page table walk. > + * That's because the deleting or splitting involves flushing TLBs, > + * which in turn issues interrupts, that will block when disabled. > + * > + * When this function is called from realmode with MSR[EE=3D0], > + * it's not needed to touch irq, since it's already disabled. > + */ > + if (enable_irq) > + local_irq_restore(irq_mask); > +} > +EXPORT_SYMBOL(__end_lockless_pgtbl_walk); > + > +/* > + * end_lockless_pgtbl_walk: Must be inserted after the last use of a p= ointer > + * returned by a lockless pagetable walk, such as __find_linux_pte() > + * This version is used by generic code, and always assume irqs will b= e enabled > + */ > +void end_lockless_pgtbl_walk(unsigned long irq_mask) > +{ > + __end_lockless_pgtbl_walk(irq_mask, true); > +} > +EXPORT_SYMBOL(end_lockless_pgtbl_walk); > + > /* > * We use this to invalidate a pmdp entry before switching from a > * hugepte to regular pmd entry. > @@ -487,7 +571,7 @@ static int __init setup_disable_tlbie(char *str) > tlbie_capable =3D false; > tlbie_enabled =3D false; > =20 > - return 1; > + return 1; Is that related to this patch at all ? > } > __setup("disable_tlbie", setup_disable_tlbie); > =20 >=20 Christophe