From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 94FE9C433F5 for ; Tue, 23 Nov 2021 06:49:08 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id A8F7E6B0071; Tue, 23 Nov 2021 01:48:52 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id A3EF86B0072; Tue, 23 Nov 2021 01:48:52 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 906F26B0073; Tue, 23 Nov 2021 01:48:52 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0172.hostedemail.com [216.40.44.172]) by kanga.kvack.org (Postfix) with ESMTP id 7DD886B0071 for ; Tue, 23 Nov 2021 01:48:52 -0500 (EST) Received: from smtpin12.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id 32BEA82751 for ; Tue, 23 Nov 2021 06:48:42 +0000 (UTC) X-FDA: 78839266842.12.20B05FE Received: from mail-qv1-f47.google.com (mail-qv1-f47.google.com [209.85.219.47]) by imf22.hostedemail.com (Postfix) with ESMTP id 2C758192B for ; Tue, 23 Nov 2021 06:48:41 +0000 (UTC) Received: by mail-qv1-f47.google.com with SMTP id g1so14278226qvd.2 for ; Mon, 22 Nov 2021 22:48:41 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=OmfO/uu5/6jUzRmPTrVGzKdDi2hxnXJ0mNJLCQlYijY=; b=NU7d4bMXEaEX7q6eGQWOV9wEPvgOlIduIMSEiYTuatzIHrtOD85kI31MoR195UiJiH KSLbjgtpsZivGPBxePewA8jE85RHef1gMu0WuRYocKZHPvQP+Upe5xzObrgqROaJZqJ8 zltKZZrPCOu0jG84fFl6cJQegcEv2kMXLbQ8B3oADQotKSvZMujQWOL/l0amqmHsbJqA pJ5C9Ea0N/96ZFXl89rOwptblPe1JJucYE5N8SGaL1aVxoml+0OVNsvY7WJQq1sKUmuZ HbV9lsPQmsoeOcL1nmGKjUOLPtTooqFq2GeKYnzYmlpR3ES40eWVNKBCRZtBor6Tt7IL 1Fhg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=OmfO/uu5/6jUzRmPTrVGzKdDi2hxnXJ0mNJLCQlYijY=; b=o65OUqpvccoRmSc3AslR0C9MGiFcq2lcOMUuq1Ry6DLHSDEFfkK+pvisLCUkAn9cBj XPmZiBxEilRIBAQArGWjS8ZIK6yI/eh33GvwRUSwU8FWRUp9jTZFQ5/RDZfjA5I3cHol RvlNkLhgNFEZEqLZ8YTfz8/0i2O0ph53RGF1KVij05tnyq5gkKlSHHF4o5+J0/QyCdYT hCP+7z6Wu3MlwXfBRe1K5C1oYhdbBT8TcX7FfEDI57we3cHhu7cLinjMPuU8WZV+rcaX 029YJBBzFOMWqDMOQZb3A9ccialvQZxQB+tVdCM+eyiJswMzB3iDQp/GhqB9CtJz2hB6 Fn4Q== X-Gm-Message-State: AOAM5330zEJ3VSJCDv3PjL2M6ZvpfOpcp02wPHQ/hrnJ8Bbo0zxmowEx F1DWLymui9Kj0a+NuP9S+7BN7WJS/Oz4opeYsZM= X-Google-Smtp-Source: ABdhPJxArIPfwJKaFtF8Lh61+JTLD7qwwREUm79kMaQYEiIaIQKijatFzi0DbRuVFEGAsQLMSXUhydR7dYK91R5nwTg= X-Received: by 2002:ad4:4e49:: with SMTP id eb9mr3147098qvb.22.1637650121119; Mon, 22 Nov 2021 22:48:41 -0800 (PST) MIME-Version: 1.0 References: <1637558929-22971-1-git-send-email-huangzhaoyang@gmail.com> In-Reply-To: From: Zhaoyang Huang Date: Tue, 23 Nov 2021 14:48:20 +0800 Message-ID: Subject: Re: [RFC PATCH] arch: arm64: introduce RODATA_FULL_USE_PTE_CONT To: Ard Biesheuvel Cc: Catalin Marinas , Will Deacon , Anshuman Khandual , Andrew Morton , Nicholas Piggin , Mike Rapoport , Pavel Tatashin , Christophe Leroy , Jonathan Marek , Zhaoyang Huang , Linux Memory Management List , Linux Kernel Mailing List Content-Type: text/plain; charset="UTF-8" X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: 2C758192B X-Stat-Signature: 3wtazeieex49h4obt11h7ktrs43kiq96 Authentication-Results: imf22.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=NU7d4bMX; spf=pass (imf22.hostedemail.com: domain of huangzhaoyang@gmail.com designates 209.85.219.47 as permitted sender) smtp.mailfrom=huangzhaoyang@gmail.com; dmarc=pass (policy=none) header.from=gmail.com X-HE-Tag: 1637650121-528363 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: ok. Could the following change be helpful? I think it is safe as nobody else but the allocator be aware of this memory block and will keep the attributes integrated while keeping the pages. diff --git a/arch/arm64/mm/pageattr.c b/arch/arm64/mm/pageattr.c index c50664b5b72c..0a9da8295d53 100644 --- a/arch/arm64/mm/pageattr.c +++ b/arch/arm64/mm/pageattr.c @@ -236,3 +236,45 @@ bool kernel_page_present(struct page *page) ptep = pte_offset_kernel(pmdp, addr); return pte_valid(READ_ONCE(*ptep)); } + +void arch_alloc_page(struct page *page, int order) +{ + unsigned long addr; + unsigned long cont_pte_low_bound; + + if (!rodata_full) + return; + addr = (u64)page_address(page); + if ((order >= 4) && (addr & ~CONT_PTE_MASK) == 0) { + order -= 4; + do { + cont_pte_low_bound = addr & CONT_PTE_MASK; + __change_memory_common(cont_pte_low_bound, + (~CONT_PTE_MASK + 1), __pgprot(PTE_CONT), __pgprot(0)); + addr = (u64)page_address(page); + page += 4; + order--; + }while (order >= 0); + } +} + +void arch_free_page(struct page *page, int order) +{ + unsigned long addr; + unsigned long cont_pte_low_bound; + + if (!rodata_full) + return; + addr = (u64)page_address(page); + if ((order >= 4) && (addr & ~CONT_PTE_MASK) == 0) { + order -= 4; + do { + cont_pte_low_bound = addr & CONT_PTE_MASK; + __change_memory_common(cont_pte_low_bound, + (~CONT_PTE_MASK + 1), __pgprot(0), __pgprot(PTE_CONT)); + addr = (u64)page_address(page); + page += 4; + order--; + }while (order >= 0); + } +} On Tue, Nov 23, 2021 at 2:16 AM Ard Biesheuvel wrote: > > On Mon, 22 Nov 2021 at 12:25, Zhaoyang Huang wrote: > > > > On Mon, Nov 22, 2021 at 5:23 PM Ard Biesheuvel wrote: > > > > > > On Mon, 22 Nov 2021 at 10:17, Zhaoyang Huang wrote: > > > > > > > > On Mon, Nov 22, 2021 at 5:04 PM Ard Biesheuvel wrote: > > > > > > > > > > On Mon, 22 Nov 2021 at 10:00, Zhaoyang Huang wrote: > > > > > > > > > > > > On Mon, Nov 22, 2021 at 4:52 PM Ard Biesheuvel wrote: > > > > > > > > > > > > > > On Mon, 22 Nov 2021 at 06:29, Huangzhaoyang wrote: > > > > > > > > > > > > > > > > From: Zhaoyang Huang > > > > > > > > > > > > > > > > Kernel linear mapping will be split to the smallest granularity when > > > > > > > > RODATA_FULL applied, which could lead to TLB pressure. Introduce a method > > > > > > > > to apply PTE_CONT on pte. > > > > > > > > > > > > > > > > Signed-off-by: Zhaoyang Huang > > > > > > > > > > > > > > How would this lead to TLB pressure, and how does the use of > > > > > > > contiguous mappings mitigate that? The linear mapping of the kernel is > > > > > > > rarely used, as all normal accesses to it go via the vmalloc region, > > > > > > > so in which case would TLB entries be allocated for this region in a > > > > > > > way that could cause a measurable performance impact? > > > > > > In fact, the patch is about to use PTE_CONT *OUT OF* the range of > > > > > > kernel text. > > > > > > > > > > OK, I had missed that, apologies. > > > > > > > > > > > Would you please have a look at the code. It apply > > > > > > PTE_CONT during map_mem and then clear it when load_module change the > > > > > > corresponding linear mapping to the area it use in vmalloc area. > > > > > > > > > > You cannot change the PTE_CONT attribute on live mappings. > > > > Is it an inner constraint from ASIC perspective? PTE_CONT is different > > > > to other attributes? > > > > > > The PTE_CONT attributes on live translation entries must always be in > > > sync in all entries covering the same contiguous group, or you might > > > cause TLB conflict aborts. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > --- > > > > > > > > arch/arm64/Kconfig | 9 +++++++++ > > > > > > > > arch/arm64/mm/mmu.c | 10 ++++++++-- > > > > > > > > arch/arm64/mm/pageattr.c | 9 +++++++++ > > > > > > > > 3 files changed, 26 insertions(+), 2 deletions(-) > > > > > > > > > > > > > > > > diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig > > > > > > > > index fee914c..3f8fbf0 100644 > > > > > > > > --- a/arch/arm64/Kconfig > > > > > > > > +++ b/arch/arm64/Kconfig > > > > > > > > @@ -1198,6 +1198,15 @@ config RODATA_FULL_DEFAULT_ENABLED > > > > > > > > This requires the linear region to be mapped down to pages, > > > > > > > > which may adversely affect performance in some cases. > > > > > > > > > > > > > > > > +config RODATA_FULL_USE_PTE_CONT > > > > > > > > + bool "Apply PTE_CONT when RODATA_FULL_DEFAULT_ENABLED enabled" > > > > > > > > + depends on RODATA_FULL_DEFAULT_ENABLED > > > > > > > > + default y > > > > > > > > + help > > > > > > > > + Apply PTE_CONT on linear mapping as much as we can when > > > > > > > > + RODATA_FULL_DEFAULT_ENABLED enabled which could decrease the > > > > > > > > + impaction on performance by small pte granularity. > > > > > > > > + > > > > > > > > config ARM64_SW_TTBR0_PAN > > > > > > > > bool "Emulate Privileged Access Never using TTBR0_EL1 switching" > > > > > > > > help > > > > > > > > diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c > > > > > > > > index cfd9deb..8017b17 100644 > > > > > > > > --- a/arch/arm64/mm/mmu.c > > > > > > > > +++ b/arch/arm64/mm/mmu.c > > > > > > > > @@ -124,15 +124,21 @@ static bool pgattr_change_is_safe(u64 old, u64 new) > > > > > > > > * The following mapping attributes may be updated in live > > > > > > > > * kernel mappings without the need for break-before-make. > > > > > > > > */ > > > > > > > > +#ifndef CONFIG_RODATA_FULL_USE_PTE_CONT > > > > > > > > pteval_t mask = PTE_PXN | PTE_RDONLY | PTE_WRITE | PTE_NG; > > > > > > > > +#else > > > > > > > > + pteval_t mask = PTE_PXN | PTE_RDONLY | PTE_WRITE | PTE_NG | PTE_CONT; > > > > > > > > +#endif > > > > > > > > > > What justifies this change? Why is it ok in certain cases to update > > > > > the PTE_CONT attribute without BBM? > > > > BBM? > > > > I am not sure if it is ok here, but I just find nobody else change > > > > PTE_CONT so far other than this commit. > > > > > > No it is not ok. BBM == break before make, which means you need to > > > install invalid entries for the entire contiguous group before you can > > > change the PTE_CONT attributes. As other CPUs may be accessing the > > > same region, there is no safe way to do this, which is why we don't > > > permit PTE_CONT to be changed at all. > > ok, got it. But from practical scenario perspective, could we assume > > that the corresponding linear map of the vmalloc area would NOT be > > accessed at all. > > If it does so, could we just clear PTE_VALID on the > > linear pte and leave PTE_CONT on the others' pte within the same > > region. The linear mapping could be reestablished when vmalloc_free. > > No. Such an invalid entry would conflict with the other PTE_CONT ones > in the same contiguous group. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > /* creating or taking down mappings is always safe */ > > > > > > > > if (old == 0 || new == 0) > > > > > > > > return true; > > > > > > > > > > > > > > > > /* live contiguous mappings may not be manipulated at all */ > > > > > > > > - if ((old | new) & PTE_CONT) > > > > > > > > +#ifndef CONFIG_RODATA_FULL_USE_PTE_CONT > > > > > > > > + if ((old | new) & PTE_CONT) > > > > > > > > return false; > > > > > > > > +#endif > > > > > > > > > > > > > > > > /* Transitioning from Non-Global to Global is unsafe */ > > > > > > > > if (old & ~new & PTE_NG) > > > > > > > > @@ -206,7 +212,7 @@ static void alloc_init_cont_pte(pmd_t *pmdp, unsigned long addr, > > > > > > > > > > > > > > > > /* use a contiguous mapping if the range is suitably aligned */ > > > > > > > > if ((((addr | next | phys) & ~CONT_PTE_MASK) == 0) && > > > > > > > > - (flags & NO_CONT_MAPPINGS) == 0) > > > > > > > > + (IS_ENABLED(CONFIG_RODATA_FULL_USE_PTE_CONT) || (flags & NO_CONT_MAPPINGS) == 0)) > > > > > > > > __prot = __pgprot(pgprot_val(prot) | PTE_CONT); > > > > > > > > > > > > > > > > init_pte(pmdp, addr, next, phys, __prot); > > > > > > > > diff --git a/arch/arm64/mm/pageattr.c b/arch/arm64/mm/pageattr.c > > > > > > > > index a3bacd7..88a87eb 100644 > > > > > > > > --- a/arch/arm64/mm/pageattr.c > > > > > > > > +++ b/arch/arm64/mm/pageattr.c > > > > > > > > @@ -99,6 +99,15 @@ static int change_memory_common(unsigned long addr, int numpages, > > > > > > > > if (rodata_full && (pgprot_val(set_mask) == PTE_RDONLY || > > > > > > > > pgprot_val(clear_mask) == PTE_RDONLY)) { > > > > > > > > for (i = 0; i < area->nr_pages; i++) { > > > > > > > > +#ifdef CONFIG_RODATA_FULL_USE_PTE_CONT > > > > > > > > + unsigned long cont_pte_low_bound; > > > > > > > > + unsigned long addr; > > > > > > > > + > > > > > > > > + addr = (u64)page_address(area->pages[i]); > > > > > > > > + cont_pte_low_bound = addr & CONT_PTE_MASK; > > > > > > > > + __change_memory_common(cont_pte_low_bound, > > > > > > > > + (~CONT_PTE_MASK + 1), __pgprot(0) , __pgprot(PTE_CONT)); > > > > > > > > +#endif > > > > > > > > __change_memory_common((u64)page_address(area->pages[i]), > > > > > > > > PAGE_SIZE, set_mask, clear_mask); > > > > > > > > } > > > > > > > > -- > > > > > > > > 1.9.1 > > > > > > > >