From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id B5122C433F5 for ; Mon, 22 Nov 2021 11:25:27 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 1EBE56B0071; Mon, 22 Nov 2021 06:25:12 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 19C5B6B0072; Mon, 22 Nov 2021 06:25:12 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 065C26B0073; Mon, 22 Nov 2021 06:25:12 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0192.hostedemail.com [216.40.44.192]) by kanga.kvack.org (Postfix) with ESMTP id ECBA66B0071 for ; Mon, 22 Nov 2021 06:25:11 -0500 (EST) Received: from smtpin29.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id 88874180AB9E1 for ; Mon, 22 Nov 2021 11:25:01 +0000 (UTC) X-FDA: 78836334402.29.383BAA4 Received: from mail-qv1-f44.google.com (mail-qv1-f44.google.com [209.85.219.44]) by imf13.hostedemail.com (Postfix) with ESMTP id E684710529B0 for ; Mon, 22 Nov 2021 11:24:58 +0000 (UTC) Received: by mail-qv1-f44.google.com with SMTP id m17so12143019qvx.8 for ; Mon, 22 Nov 2021 03:25:00 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=S7GpUDxMNlsc+fb8R8i2IvrXfH6R7syAYUue6uwl3OU=; b=JZl0pEcSFxkldvAQRSkj3ZUPegR/ZVNwt4oYymj4AhOwfUvaFtIT8s/1iYuH07jzLb on2f+/z+e3kQ+kenDMKtpPiaFKrDDomicbRJHx/fhFIqxwRzyfTulHTcp+kRj6zQNXIA THL37L6orgPjrYVUsW65H+SbPJhYxUFddv2XAUVopZx6Ux5oV55FIqqZSj7wrRvk2omu rGQ+taLfOdmAr3In7TmEQm1irtY6xtYw+up86ISkLwlhrRc2yPrDa2FzDV5LpLjysKwO sz9AzZU48dS+edE62lvwbF16hrhkbXLZ+px+kp9nEhDJ9XtSRxmwxgw7eQviecytn7a+ 1tKw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=S7GpUDxMNlsc+fb8R8i2IvrXfH6R7syAYUue6uwl3OU=; b=vEo7GsEYwhBFeKVf2+YKKQfrgwqcxJ9maM7q7y6MPhxIHirpR7R9itm+ZluQcJhKZr nXSxPlVqzcQdEGvn2PVGeIKjp7igPvzn2WGdGwNby+cFydC8S1WGrEnU45XYzcr3fmK+ KR6VVH49sKBpTgE5dGhArFPD8kYzHxs8OnkwzLSo4JhRb/KprEjwPuJ9opqmy9TrIwBo vsiIMQ5m2pkY0IaS1+G8z+klpr/T/Mn9lPvqjFOxsycRo1IV3tBiwusiq7wgapebJZpy fcpsSvE66L4FgA/CpDuD5incN2cPokgyWt0WASXFOh4pGT8MgSAcMM5zQNj0xQILyfCH fFYw== X-Gm-Message-State: AOAM530P1Fffnd14CbBOQxOFtBbS9OclQeyrZlJIMmva0zSF989h1qpW 7j575GqYa0sCpz6UjNz4kTwnyi/0Ev92jjnPsqQ= X-Google-Smtp-Source: ABdhPJyqfr8HbC3TvbFrm6P9lZ+slO+iD6NyFrFJzVjqgNqRKecLFoFVVBlMx3wZoLFTM4OHuv9Ye6rgmREd/OigpTc= X-Received: by 2002:a05:6214:d8e:: with SMTP id e14mr99344013qve.37.1637580300473; Mon, 22 Nov 2021 03:25:00 -0800 (PST) MIME-Version: 1.0 References: <1637558929-22971-1-git-send-email-huangzhaoyang@gmail.com> In-Reply-To: From: Zhaoyang Huang Date: Mon, 22 Nov 2021 19:24:39 +0800 Message-ID: Subject: Re: [RFC PATCH] arch: arm64: introduce RODATA_FULL_USE_PTE_CONT To: Ard Biesheuvel Cc: Catalin Marinas , Will Deacon , Anshuman Khandual , Andrew Morton , Nicholas Piggin , Mike Rapoport , Pavel Tatashin , Christophe Leroy , Jonathan Marek , Zhaoyang Huang , Linux Memory Management List , Linux Kernel Mailing List Content-Type: text/plain; charset="UTF-8" X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: E684710529B0 X-Stat-Signature: nhncqytdr4fs3jpqazo1ffk3wn5n7dcx Authentication-Results: imf13.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=JZl0pEcS; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf13.hostedemail.com: domain of huangzhaoyang@gmail.com designates 209.85.219.44 as permitted sender) smtp.mailfrom=huangzhaoyang@gmail.com X-HE-Tag: 1637580298-632418 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Mon, Nov 22, 2021 at 5:23 PM Ard Biesheuvel wrote: > > On Mon, 22 Nov 2021 at 10:17, Zhaoyang Huang wrote: > > > > On Mon, Nov 22, 2021 at 5:04 PM Ard Biesheuvel wrote: > > > > > > On Mon, 22 Nov 2021 at 10:00, Zhaoyang Huang wrote: > > > > > > > > On Mon, Nov 22, 2021 at 4:52 PM Ard Biesheuvel wrote: > > > > > > > > > > On Mon, 22 Nov 2021 at 06:29, Huangzhaoyang wrote: > > > > > > > > > > > > From: Zhaoyang Huang > > > > > > > > > > > > Kernel linear mapping will be split to the smallest granularity when > > > > > > RODATA_FULL applied, which could lead to TLB pressure. Introduce a method > > > > > > to apply PTE_CONT on pte. > > > > > > > > > > > > Signed-off-by: Zhaoyang Huang > > > > > > > > > > How would this lead to TLB pressure, and how does the use of > > > > > contiguous mappings mitigate that? The linear mapping of the kernel is > > > > > rarely used, as all normal accesses to it go via the vmalloc region, > > > > > so in which case would TLB entries be allocated for this region in a > > > > > way that could cause a measurable performance impact? > > > > In fact, the patch is about to use PTE_CONT *OUT OF* the range of > > > > kernel text. > > > > > > OK, I had missed that, apologies. > > > > > > > Would you please have a look at the code. It apply > > > > PTE_CONT during map_mem and then clear it when load_module change the > > > > corresponding linear mapping to the area it use in vmalloc area. > > > > > > You cannot change the PTE_CONT attribute on live mappings. > > Is it an inner constraint from ASIC perspective? PTE_CONT is different > > to other attributes? > > The PTE_CONT attributes on live translation entries must always be in > sync in all entries covering the same contiguous group, or you might > cause TLB conflict aborts. > > > > > > > > > > > > > > > > > > > > --- > > > > > > arch/arm64/Kconfig | 9 +++++++++ > > > > > > arch/arm64/mm/mmu.c | 10 ++++++++-- > > > > > > arch/arm64/mm/pageattr.c | 9 +++++++++ > > > > > > 3 files changed, 26 insertions(+), 2 deletions(-) > > > > > > > > > > > > diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig > > > > > > index fee914c..3f8fbf0 100644 > > > > > > --- a/arch/arm64/Kconfig > > > > > > +++ b/arch/arm64/Kconfig > > > > > > @@ -1198,6 +1198,15 @@ config RODATA_FULL_DEFAULT_ENABLED > > > > > > This requires the linear region to be mapped down to pages, > > > > > > which may adversely affect performance in some cases. > > > > > > > > > > > > +config RODATA_FULL_USE_PTE_CONT > > > > > > + bool "Apply PTE_CONT when RODATA_FULL_DEFAULT_ENABLED enabled" > > > > > > + depends on RODATA_FULL_DEFAULT_ENABLED > > > > > > + default y > > > > > > + help > > > > > > + Apply PTE_CONT on linear mapping as much as we can when > > > > > > + RODATA_FULL_DEFAULT_ENABLED enabled which could decrease the > > > > > > + impaction on performance by small pte granularity. > > > > > > + > > > > > > config ARM64_SW_TTBR0_PAN > > > > > > bool "Emulate Privileged Access Never using TTBR0_EL1 switching" > > > > > > help > > > > > > diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c > > > > > > index cfd9deb..8017b17 100644 > > > > > > --- a/arch/arm64/mm/mmu.c > > > > > > +++ b/arch/arm64/mm/mmu.c > > > > > > @@ -124,15 +124,21 @@ static bool pgattr_change_is_safe(u64 old, u64 new) > > > > > > * The following mapping attributes may be updated in live > > > > > > * kernel mappings without the need for break-before-make. > > > > > > */ > > > > > > +#ifndef CONFIG_RODATA_FULL_USE_PTE_CONT > > > > > > pteval_t mask = PTE_PXN | PTE_RDONLY | PTE_WRITE | PTE_NG; > > > > > > +#else > > > > > > + pteval_t mask = PTE_PXN | PTE_RDONLY | PTE_WRITE | PTE_NG | PTE_CONT; > > > > > > +#endif > > > > > > What justifies this change? Why is it ok in certain cases to update > > > the PTE_CONT attribute without BBM? > > BBM? > > I am not sure if it is ok here, but I just find nobody else change > > PTE_CONT so far other than this commit. > > No it is not ok. BBM == break before make, which means you need to > install invalid entries for the entire contiguous group before you can > change the PTE_CONT attributes. As other CPUs may be accessing the > same region, there is no safe way to do this, which is why we don't > permit PTE_CONT to be changed at all. ok, got it. But from practical scenario perspective, could we assume that the corresponding linear map of the vmalloc area would NOT be accessed at all. If it does so, could we just clear PTE_VALID on the linear pte and leave PTE_CONT on the others' pte within the same region. The linear mapping could be reestablished when vmalloc_free. > > > > > > > > > > > > > > > > > /* creating or taking down mappings is always safe */ > > > > > > if (old == 0 || new == 0) > > > > > > return true; > > > > > > > > > > > > /* live contiguous mappings may not be manipulated at all */ > > > > > > - if ((old | new) & PTE_CONT) > > > > > > +#ifndef CONFIG_RODATA_FULL_USE_PTE_CONT > > > > > > + if ((old | new) & PTE_CONT) > > > > > > return false; > > > > > > +#endif > > > > > > > > > > > > /* Transitioning from Non-Global to Global is unsafe */ > > > > > > if (old & ~new & PTE_NG) > > > > > > @@ -206,7 +212,7 @@ static void alloc_init_cont_pte(pmd_t *pmdp, unsigned long addr, > > > > > > > > > > > > /* use a contiguous mapping if the range is suitably aligned */ > > > > > > if ((((addr | next | phys) & ~CONT_PTE_MASK) == 0) && > > > > > > - (flags & NO_CONT_MAPPINGS) == 0) > > > > > > + (IS_ENABLED(CONFIG_RODATA_FULL_USE_PTE_CONT) || (flags & NO_CONT_MAPPINGS) == 0)) > > > > > > __prot = __pgprot(pgprot_val(prot) | PTE_CONT); > > > > > > > > > > > > init_pte(pmdp, addr, next, phys, __prot); > > > > > > diff --git a/arch/arm64/mm/pageattr.c b/arch/arm64/mm/pageattr.c > > > > > > index a3bacd7..88a87eb 100644 > > > > > > --- a/arch/arm64/mm/pageattr.c > > > > > > +++ b/arch/arm64/mm/pageattr.c > > > > > > @@ -99,6 +99,15 @@ static int change_memory_common(unsigned long addr, int numpages, > > > > > > if (rodata_full && (pgprot_val(set_mask) == PTE_RDONLY || > > > > > > pgprot_val(clear_mask) == PTE_RDONLY)) { > > > > > > for (i = 0; i < area->nr_pages; i++) { > > > > > > +#ifdef CONFIG_RODATA_FULL_USE_PTE_CONT > > > > > > + unsigned long cont_pte_low_bound; > > > > > > + unsigned long addr; > > > > > > + > > > > > > + addr = (u64)page_address(area->pages[i]); > > > > > > + cont_pte_low_bound = addr & CONT_PTE_MASK; > > > > > > + __change_memory_common(cont_pte_low_bound, > > > > > > + (~CONT_PTE_MASK + 1), __pgprot(0) , __pgprot(PTE_CONT)); > > > > > > +#endif > > > > > > __change_memory_common((u64)page_address(area->pages[i]), > > > > > > PAGE_SIZE, set_mask, clear_mask); > > > > > > } > > > > > > -- > > > > > > 1.9.1 > > > > > >