From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 46701C021A0 for ; Thu, 13 Feb 2025 09:39:00 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 993776B0083; Thu, 13 Feb 2025 04:38:59 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 91C046B0085; Thu, 13 Feb 2025 04:38:59 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 7BD146B0088; Thu, 13 Feb 2025 04:38:59 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 5A7A36B0083 for ; Thu, 13 Feb 2025 04:38:59 -0500 (EST) Received: from smtpin22.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 041E0C07E2 for ; Thu, 13 Feb 2025 09:38:58 +0000 (UTC) X-FDA: 83114422398.22.EB3BE36 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by imf26.hostedemail.com (Postfix) with ESMTP id F0291140005 for ; Thu, 13 Feb 2025 09:38:56 +0000 (UTC) Authentication-Results: imf26.hostedemail.com; dkim=none; dmarc=pass (policy=none) header.from=arm.com; spf=pass (imf26.hostedemail.com: domain of ryan.roberts@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=ryan.roberts@arm.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1739439537; a=rsa-sha256; cv=none; b=rK0l2wmbF3xRPXRSyScw5xCusTC1QKhXel4lLmyS7EKLrkmieo4MEOh1aHklctBJ3QSdA0 3VhR2B4ABe1e2adPWmos/f6AqbMJrFFDGoCCc1MYNZNW//ybVBefjCotpRwUA1Igt64HKX FFwK4MP8yYRnjTIpT2jSDqyX/OecBoo= ARC-Authentication-Results: i=1; imf26.hostedemail.com; dkim=none; dmarc=pass (policy=none) header.from=arm.com; spf=pass (imf26.hostedemail.com: domain of ryan.roberts@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=ryan.roberts@arm.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1739439537; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=VVJfjnoZO3QbQm2eyYLrdQhN6irTyZI2sAl/74A6tB0=; b=ZF9bi3lNeDYruMFsyZ5nD3+jAdqqb8OjJF4xWMeqS/b4IZaq4bq0D9tRJTGU1CQSsSWhc1 /aiyv4NeY9zLIwx8CX6GZGcagyYHZTs70Tz7dH34Ie0TrUdJxvReagI5MQLb6a0xHbGwD8 JcCKnWi6bGMpN22xgTIO38DofB0IRdo= Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id E9D4516F3; Thu, 13 Feb 2025 01:39:16 -0800 (PST) Received: from [10.57.81.93] (unknown [10.57.81.93]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 9BC773F58B; Thu, 13 Feb 2025 01:38:53 -0800 (PST) Message-ID: <43d852bb-ab5f-40be-b188-166c57ab795c@arm.com> Date: Thu, 13 Feb 2025 09:38:52 +0000 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH v1 16/16] arm64/mm: Defer barriers when updating kernel mappings Content-Language: en-GB To: Anshuman Khandual , Catalin Marinas , Will Deacon , Muchun Song , Pasha Tatashin , Andrew Morton , Uladzislau Rezki , Christoph Hellwig , Mark Rutland , Ard Biesheuvel , Dev Jain , Alexandre Ghiti , Steve Capper , Kevin Brodsky Cc: linux-arm-kernel@lists.infradead.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org References: <20250205151003.88959-1-ryan.roberts@arm.com> <20250205151003.88959-17-ryan.roberts@arm.com> <9bc5527e-16f4-45cc-aced-55b1ace6c143@arm.com> <0052097e-2284-4f9e-b37c-2ca2de527667@arm.com> From: Ryan Roberts In-Reply-To: <0052097e-2284-4f9e-b37c-2ca2de527667@arm.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Rspamd-Queue-Id: F0291140005 X-Stat-Signature: bo8jhhpbmnfty4jy7fjhrem5ci3ndkhw X-Rspam-User: X-Rspamd-Server: rspam10 X-HE-Tag: 1739439536-503418 X-HE-Meta: U2FsdGVkX19VGQFj+R/VwxQh5VEtDhnxk5WetYAhJHCi0+K0UwLsCkkhXYqCTDkt+rBKW+WcGpQgr29MOyAlFfnLeREFPuLdWJecvL+qbwuakrpm+Z+2HZdpZKjsTOGcvZW1rjYioDWan3Qfv8r0ZB0LyADTkOtdDFeYvV3urHVKEQtyvhOltqGLoWtLxSMNJjf/h7kYamwwNFFLUv18NJH8en1+mV/yr69+DarujHaopaprZmX5/U/L2hGMWHwbsXUqWdqqjCXQTlU3in/AKuyIv3x3gd96o09AVOyAsS4NEdoXDG7GmjcswcGLqDyvqD/tGV2t3G4CUDfFQeRu7UYhayiwFN7Xevh7NrrR0kWULo/BdiTgqyvI89ZU4jcHoQJigdbmim+SNc+EjgBW4GZK8VA2sDEh/BfCHe90nhys9azZmJQ5nq6qTNlt4GE6H2lNIt92rcOFC5z8qR+/liT9TkJV3ShLV1KzM5gJ2dNmV4Yw0JJ9/LJwU3IJfpDe5hyWV1YvhdNyBgNi+KGEiyG47F/8Kgo/0MeVsIO7Ec7NMDPF9GYwDLXexOhAV4zTGF0I9TXd00Vr7RqqlrkS/h8h8eO+Gg7zZu1xkOIE9oAP9q+eJIx0v3+90CiugcAigp758ChpeqA3fA8BJ5Z7EBjm2ORtz7Hzmqb6ZKTKJlGsQ2WQ15ALgE+uS9pDLzcOB9bu7cttKjAQlzG3xCPhhUHxf2BuMQK0fS9BGTbaCkpwmvJgpuvfnG0Z6DVS0GuuxNft8jwGO6y3se9SScullZv0ZAh/lHqoH3/FqBwNP84Xy1nCG1u25RrTa+StFdhn38WR4/B5g7WWHU7pyHLrKEaunFTWdy9CAahnvEfe/KKXzZV6/GnEF1uNSeYqZUQNZiB9ySHIW7fP1jJmg0yh9+gailvXU6+AItCaf3wlNSBtgER9LJyZUO//Li4ulWca7uCUghUpIAM71Ha8Am6 OHC+4ePB j7A1M52uCENEOW/tgH9ebZHBqMubdGLgYmt+f9b5C9aTxo8sZCwAP6EDRQoS9g5LrB5mTmKhu7ZvckpsoHHJtFJLBCgX/YAqxkBr4r1X7JeYhPDnc78KPS/QRRrmZJ+EAgkpVS+Qa0+5kAlyRNouQpQUvhyXuV+ZwQVrAisMRpxL0gap+HcT4ikyYiqlbeWv4lwaJz1PxJlIduSw= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 13/02/2025 05:30, Anshuman Khandual wrote: > > > On 2/10/25 16:42, Ryan Roberts wrote: >> On 10/02/2025 08:03, Anshuman Khandual wrote: >>> >>> >>> On 2/5/25 20:39, Ryan Roberts wrote: >>>> Because the kernel can't tolerate page faults for kernel mappings, when >>>> setting a valid, kernel space pte (or pmd/pud/p4d/pgd), it emits a >>>> dsb(ishst) to ensure that the store to the pgtable is observed by the >>>> table walker immediately. Additionally it emits an isb() to ensure that >>>> any already speculatively determined invalid mapping fault gets >>>> canceled.> >>>> We can improve the performance of vmalloc operations by batching these >>>> barriers until the end of a set up entry updates. The newly added >>>> arch_update_kernel_mappings_begin() / arch_update_kernel_mappings_end() >>>> provide the required hooks. >>>> >>>> vmalloc improves by up to 30% as a result. >>>> >>>> Two new TIF_ flags are created; TIF_KMAP_UPDATE_ACTIVE tells us if we >>>> are in the batch mode and can therefore defer any barriers until the end >>>> of the batch. TIF_KMAP_UPDATE_PENDING tells us if barriers are queued to >>>> be emited at the end of the batch. >>> >>> Why cannot this be achieved with a single TIF_KMAP_UPDATE_ACTIVE which is >>> set in __begin(), cleared in __end() and saved across a __switch_to(). >> >> So unconditionally emit the barriers in _end(), and emit them in __switch_to() >> if TIF_KMAP_UPDATE_ACTIVE is set? > > Right. > >> >> I guess if calling _begin() then you are definitely going to be setting at least >> 1 PTE. So you can definitely emit the barriers unconditionally. I was trying to >> protect against the case where you get pre-empted (potentially multiple times) >> while in the loop. The TIF_KMAP_UPDATE_PENDING flag ensures you only emit the >> barriers when you definitely need to. Without it, you would have to emit on >> every pre-emption even if no more PTEs got set. >> >> But I suspect this is a premature optimization. Probably it will never occur. So > > Agreed. Having done this simplification, I've just noticed that one of the arch_update_kernel_mappings_begin/end callsites is __apply_to_page_range() which gets called for user space mappings as well as kernel mappings. So actually with the simplification I'll be emitting barriers even when only user space mappings were touched. I think there are a couple of options to fix this: - Revert to the 2 flag approach. For the user space case, I'll get to _end() and notice that no barriers are queued so will emit nothing. - Only set TIF_KMAP_UPDATE_ACTIVE if the address range passed to _begin() is a kernel address range. I guess that's just a case of checking if the MSB is set in "end"? - pass mm to _begin() and only set TIF_KMAP_UPDATE_ACTIVE if mm == &init_mm. I guess this should be the same as option 2. I'm leaning towards option 2. But I have a niggling feeling that my proposed check isn't quite correct. What do you think? Thanks, Ryan > >> I'll simplify as you suggest. >> >> Thanks, >> Ryan >> >>> >>>> >>>> Signed-off-by: Ryan Roberts >>>> --- >>>> arch/arm64/include/asm/pgtable.h | 65 +++++++++++++++++++--------- >>>> arch/arm64/include/asm/thread_info.h | 2 + >>>> arch/arm64/kernel/process.c | 20 +++++++-- >>>> 3 files changed, 63 insertions(+), 24 deletions(-) >>>> >>>> diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h >>>> index ff358d983583..1ee9b9588502 100644 >>>> --- a/arch/arm64/include/asm/pgtable.h >>>> +++ b/arch/arm64/include/asm/pgtable.h >>>> @@ -39,6 +39,41 @@ >>>> #include >>>> #include >>>> #include >>>> +#include >>>> + >>>> +static inline void emit_pte_barriers(void) >>>> +{ >>>> + dsb(ishst); >>>> + isb(); >>>> +} >>> >>> There are many sequence of these two barriers in this particular header, >>> hence probably a good idea to factor this out into a common helper. >>>>> + >>>> +static inline void queue_pte_barriers(void) >>>> +{ >>>> + if (test_thread_flag(TIF_KMAP_UPDATE_ACTIVE)) { >>>> + if (!test_thread_flag(TIF_KMAP_UPDATE_PENDING)) >>>> + set_thread_flag(TIF_KMAP_UPDATE_PENDING); >>>> + } else >>>> + emit_pte_barriers(); >>>> +} >>>> + >>>> +#define arch_update_kernel_mappings_begin arch_update_kernel_mappings_begin >>>> +static inline void arch_update_kernel_mappings_begin(unsigned long start, >>>> + unsigned long end) >>>> +{ >>>> + set_thread_flag(TIF_KMAP_UPDATE_ACTIVE); >>>> +} >>>> + >>>> +#define arch_update_kernel_mappings_end arch_update_kernel_mappings_end >>>> +static inline void arch_update_kernel_mappings_end(unsigned long start, >>>> + unsigned long end, >>>> + pgtbl_mod_mask mask) >>>> +{ >>>> + if (test_thread_flag(TIF_KMAP_UPDATE_PENDING)) >>>> + emit_pte_barriers(); >>>> + >>>> + clear_thread_flag(TIF_KMAP_UPDATE_PENDING); >>>> + clear_thread_flag(TIF_KMAP_UPDATE_ACTIVE); >>>> +} >>>> >>>> #ifdef CONFIG_TRANSPARENT_HUGEPAGE >>>> #define __HAVE_ARCH_FLUSH_PMD_TLB_RANGE >>>> @@ -323,10 +358,8 @@ static inline void __set_pte_complete(pte_t pte) >>>> * Only if the new pte is valid and kernel, otherwise TLB maintenance >>>> * or update_mmu_cache() have the necessary barriers. >>>> */ >>>> - if (pte_valid_not_user(pte)) { >>>> - dsb(ishst); >>>> - isb(); >>>> - } >>>> + if (pte_valid_not_user(pte)) >>>> + queue_pte_barriers(); >>>> } >>>> >>>> static inline void __set_pte(pte_t *ptep, pte_t pte) >>>> @@ -791,10 +824,8 @@ static inline void set_pmd(pmd_t *pmdp, pmd_t pmd) >>>> >>>> WRITE_ONCE(*pmdp, pmd); >>>> >>>> - if (pmd_valid_not_user(pmd)) { >>>> - dsb(ishst); >>>> - isb(); >>>> - } >>>> + if (pmd_valid_not_user(pmd)) >>>> + queue_pte_barriers(); >>>> } >>>> >>>> static inline void pmd_clear(pmd_t *pmdp) >>>> @@ -869,10 +900,8 @@ static inline void set_pud(pud_t *pudp, pud_t pud) >>>> >>>> WRITE_ONCE(*pudp, pud); >>>> >>>> - if (pud_valid_not_user(pud)) { >>>> - dsb(ishst); >>>> - isb(); >>>> - } >>>> + if (pud_valid_not_user(pud)) >>>> + queue_pte_barriers(); >>>> } >>>> >>>> static inline void pud_clear(pud_t *pudp) >>>> @@ -960,10 +989,8 @@ static inline void set_p4d(p4d_t *p4dp, p4d_t p4d) >>>> >>>> WRITE_ONCE(*p4dp, p4d); >>>> >>>> - if (p4d_valid_not_user(p4d)) { >>>> - dsb(ishst); >>>> - isb(); >>>> - } >>>> + if (p4d_valid_not_user(p4d)) >>>> + queue_pte_barriers(); >>>> } >>>> >>>> static inline void p4d_clear(p4d_t *p4dp) >>>> @@ -1098,10 +1125,8 @@ static inline void set_pgd(pgd_t *pgdp, pgd_t pgd) >>>> >>>> WRITE_ONCE(*pgdp, pgd); >>>> >>>> - if (pgd_valid_not_user(pgd)) { >>>> - dsb(ishst); >>>> - isb(); >>>> - } >>>> + if (pgd_valid_not_user(pgd)) >>>> + queue_pte_barriers(); >>>> } >>>> >>>> static inline void pgd_clear(pgd_t *pgdp) >>>> diff --git a/arch/arm64/include/asm/thread_info.h b/arch/arm64/include/asm/thread_info.h >>>> index 1114c1c3300a..382d2121261e 100644 >>>> --- a/arch/arm64/include/asm/thread_info.h >>>> +++ b/arch/arm64/include/asm/thread_info.h >>>> @@ -82,6 +82,8 @@ void arch_setup_new_exec(void); >>>> #define TIF_SME_VL_INHERIT 28 /* Inherit SME vl_onexec across exec */ >>>> #define TIF_KERNEL_FPSTATE 29 /* Task is in a kernel mode FPSIMD section */ >>>> #define TIF_TSC_SIGSEGV 30 /* SIGSEGV on counter-timer access */ >>>> +#define TIF_KMAP_UPDATE_ACTIVE 31 /* kernel map update in progress */ >>>> +#define TIF_KMAP_UPDATE_PENDING 32 /* kernel map updated with deferred barriers */ >>>> >>>> #define _TIF_SIGPENDING (1 << TIF_SIGPENDING) >>>> #define _TIF_NEED_RESCHED (1 << TIF_NEED_RESCHED) >>>> diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c >>>> index 42faebb7b712..1367ec6407d1 100644 >>>> --- a/arch/arm64/kernel/process.c >>>> +++ b/arch/arm64/kernel/process.c >>>> @@ -680,10 +680,10 @@ struct task_struct *__switch_to(struct task_struct *prev, >>>> gcs_thread_switch(next); >>>> >>>> /* >>>> - * Complete any pending TLB or cache maintenance on this CPU in case >>>> - * the thread migrates to a different CPU. >>>> - * This full barrier is also required by the membarrier system >>>> - * call. >>>> + * Complete any pending TLB or cache maintenance on this CPU in case the >>>> + * thread migrates to a different CPU. This full barrier is also >>>> + * required by the membarrier system call. Additionally it is required >>>> + * for TIF_KMAP_UPDATE_PENDING, see below. >>>> */ >>>> dsb(ish); >>>> >>>> @@ -696,6 +696,18 @@ struct task_struct *__switch_to(struct task_struct *prev, >>>> /* avoid expensive SCTLR_EL1 accesses if no change */ >>>> if (prev->thread.sctlr_user != next->thread.sctlr_user) >>>> update_sctlr_el1(next->thread.sctlr_user); >>>> + else if (unlikely(test_thread_flag(TIF_KMAP_UPDATE_PENDING))) { >>>> + /* >>>> + * In unlikely event that a kernel map update is on-going when >>>> + * preemption occurs, we must emit_pte_barriers() if pending. >>>> + * emit_pte_barriers() consists of "dsb(ishst); isb();". The dsb >>>> + * is already handled above. The isb() is handled if >>>> + * update_sctlr_el1() was called. So only need to emit isb() >>>> + * here if it wasn't called. >>>> + */ >>>> + isb(); >>>> + clear_thread_flag(TIF_KMAP_UPDATE_PENDING); >>>> + } >>>> >>>> /* the actual thread switch */ >>>> last = cpu_switch_to(prev, next); >>