From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-11.5 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI,NICE_REPLY_A, SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_SANE_1 autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3D736C433E3 for ; Tue, 21 Jul 2020 18:36:22 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id BE73F20717 for ; Tue, 21 Jul 2020 18:36:21 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org BE73F20717 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=ghiti.fr Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 554EC8D0001; Tue, 21 Jul 2020 14:36:21 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 52D816B0003; Tue, 21 Jul 2020 14:36:21 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 41FBD8D0001; Tue, 21 Jul 2020 14:36:21 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0143.hostedemail.com [216.40.44.143]) by kanga.kvack.org (Postfix) with ESMTP id 21AED6B0002 for ; Tue, 21 Jul 2020 14:36:21 -0400 (EDT) Received: from smtpin29.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id A706918000BE8 for ; Tue, 21 Jul 2020 18:36:20 +0000 (UTC) X-FDA: 77062938120.29.brain80_430334f26f2f Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin29.hostedemail.com (Postfix) with ESMTP id 87A3018086E23 for ; Tue, 21 Jul 2020 18:36:20 +0000 (UTC) X-HE-Tag: brain80_430334f26f2f X-Filterd-Recvd-Size: 18536 Received: from relay12.mail.gandi.net (relay12.mail.gandi.net [217.70.178.232]) by imf42.hostedemail.com (Postfix) with ESMTP for ; Tue, 21 Jul 2020 18:36:19 +0000 (UTC) Received: from [192.168.1.14] (lfbn-gre-1-325-105.w90-112.abo.wanadoo.fr [90.112.45.105]) (Authenticated sender: alex@ghiti.fr) by relay12.mail.gandi.net (Postfix) with ESMTPSA id 94342200009; Tue, 21 Jul 2020 18:36:12 +0000 (UTC) Subject: Re: [PATCH v5 1/4] riscv: Move kernel mapping to vmalloc zone From: Alex Ghiti To: Palmer Dabbelt Cc: mpe@ellerman.id.au, benh@kernel.crashing.org, paulus@samba.org, Paul Walmsley , aou@eecs.berkeley.edu, Anup Patel , Atish Patra , zong.li@sifive.com, linux-kernel@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, linux-riscv@lists.infradead.org, linux-mm@kvack.org References: Message-ID: <7cb2285e-68ba-6827-5e61-e33a4b65ac03@ghiti.fr> Date: Tue, 21 Jul 2020 14:36:10 -0400 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.10.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: fr X-Rspamd-Queue-Id: 87A3018086E23 X-Spamd-Result: default: False [0.00 / 100.00] X-Rspamd-Server: rspam04 Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Let's try to make progress here: I add linux-mm in CC to get feedback on=20 this patch as it blocks sv48 support too. Alex Le 7/9/20 =C3=A0 7:11 AM, Alex Ghiti a =C3=A9crit=C2=A0: > Hi Palmer, >=20 > Le 7/9/20 =C3=A0 1:05 AM, Palmer Dabbelt a =C3=A9crit=C2=A0: >> On Sun, 07 Jun 2020 00:59:46 PDT (-0700), alex@ghiti.fr wrote: >>> This is a preparatory patch for relocatable kernel. >>> >>> The kernel used to be linked at PAGE_OFFSET address and used to be=20 >>> loaded >>> physically at the beginning of the main memory. Therefore, we could u= se >>> the linear mapping for the kernel mapping. >>> >>> But the relocated kernel base address will be different from PAGE_OFF= SET >>> and since in the linear mapping, two different virtual addresses cann= ot >>> point to the same physical address, the kernel mapping needs to lie=20 >>> outside >>> the linear mapping. >> >> I know it's been a while, but I keep opening this up to review it and=20 >> just >> can't get over how ugly it is to put the kernel's linear map in the=20 >> vmalloc >> region. >> >> I guess I don't understand why this is necessary at all. =20 >> Specifically: why >> can't we just relocate the kernel within the linear map?=C2=A0 That wo= uld=20 >> let the >> bootloader put the kernel wherever it wants, modulo the physical=20 >> memory size we >> support.=C2=A0 We'd need to handle the regions that are coupled to the= =20 >> kernel's >> execution address, but we could just put them in an explicit memory=20 >> region >> which is what we should probably be doing anyway. >=20 > Virtual relocation in the linear mapping requires to move the kernel=20 > physically too. Zong implemented this physical move in its KASLR RFC=20 > patchset, which is cumbersome since finding an available physical spot=20 > is harder than just selecting a virtual range in the vmalloc range. >=20 > In addition, having the kernel mapping in the linear mapping prevents=20 > the use of hugepage for the linear mapping resulting in performance los= s=20 > (at least for the GB that encompasses the kernel). >=20 > Why do you find this "ugly" ? The vmalloc region is just a bunch of=20 > available virtual addresses to whatever purpose we want, and as noted b= y=20 > Zong, arm64 uses the same scheme. >=20 >> >>> In addition, because modules and BPF must be close to the kernel (ins= ide >>> +-2GB window), the kernel is placed at the end of the vmalloc zone mi= nus >>> 2GB, which leaves room for modules and BPF. The kernel could not be >>> placed at the beginning of the vmalloc zone since other vmalloc >>> allocations from the kernel could get all the +-2GB window around the >>> kernel which would prevent new modules and BPF programs to be loaded. >> >> Well, that's not enough to make sure this doesn't happen -- it's just=20 >> enough to >> make sure it doesn't happen very quickily.=C2=A0 That's the same boat = we're=20 >> already >> in, though, so it's not like it's worse. >=20 > Indeed, that's not worse, I haven't found a way to reserve vmalloc area= =20 > without actually allocating it. >=20 >> >>> Signed-off-by: Alexandre Ghiti >>> Reviewed-by: Zong Li >>> --- >>> =C2=A0arch/riscv/boot/loader.lds.S=C2=A0=C2=A0=C2=A0=C2=A0 |=C2=A0 3 = +- >>> =C2=A0arch/riscv/include/asm/page.h=C2=A0=C2=A0=C2=A0 | 10 +++++- >>> =C2=A0arch/riscv/include/asm/pgtable.h | 38 ++++++++++++++------- >>> =C2=A0arch/riscv/kernel/head.S=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0 |=C2=A0 3 +- >>> =C2=A0arch/riscv/kernel/module.c=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 = |=C2=A0 4 +-- >>> =C2=A0arch/riscv/kernel/vmlinux.lds.S=C2=A0 |=C2=A0 3 +- >>> =C2=A0arch/riscv/mm/init.c=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0 | 58 +++++++++++++++++++++++++------- >>> =C2=A0arch/riscv/mm/physaddr.c=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0 |=C2=A0 2 +- >>> =C2=A08 files changed, 88 insertions(+), 33 deletions(-) >>> >>> diff --git a/arch/riscv/boot/loader.lds.S b/arch/riscv/boot/loader.ld= s.S >>> index 47a5003c2e28..62d94696a19c 100644 >>> --- a/arch/riscv/boot/loader.lds.S >>> +++ b/arch/riscv/boot/loader.lds.S >>> @@ -1,13 +1,14 @@ >>> =C2=A0/* SPDX-License-Identifier: GPL-2.0 */ >>> >>> =C2=A0#include >>> +#include >>> >>> =C2=A0OUTPUT_ARCH(riscv) >>> =C2=A0ENTRY(_start) >>> >>> =C2=A0SECTIONS >>> =C2=A0{ >>> -=C2=A0=C2=A0=C2=A0 . =3D PAGE_OFFSET; >>> +=C2=A0=C2=A0=C2=A0 . =3D KERNEL_LINK_ADDR; >>> >>> =C2=A0=C2=A0=C2=A0=C2=A0 .payload : { >>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 *(.payload) >>> diff --git a/arch/riscv/include/asm/page.h=20 >>> b/arch/riscv/include/asm/page.h >>> index 2d50f76efe48..48bb09b6a9b7 100644 >>> --- a/arch/riscv/include/asm/page.h >>> +++ b/arch/riscv/include/asm/page.h >>> @@ -90,18 +90,26 @@ typedef struct page *pgtable_t; >>> >>> =C2=A0#ifdef CONFIG_MMU >>> =C2=A0extern unsigned long va_pa_offset; >>> +extern unsigned long va_kernel_pa_offset; >>> =C2=A0extern unsigned long pfn_base; >>> =C2=A0#define ARCH_PFN_OFFSET=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= (pfn_base) >>> =C2=A0#else >>> =C2=A0#define va_pa_offset=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 = 0 >>> +#define va_kernel_pa_offset=C2=A0=C2=A0=C2=A0 0 >>> =C2=A0#define ARCH_PFN_OFFSET=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= (PAGE_OFFSET >> PAGE_SHIFT) >>> =C2=A0#endif /* CONFIG_MMU */ >>> >>> =C2=A0extern unsigned long max_low_pfn; >>> =C2=A0extern unsigned long min_low_pfn; >>> +extern unsigned long kernel_virt_addr; >>> >>> =C2=A0#define __pa_to_va_nodebug(x)=C2=A0=C2=A0=C2=A0 ((void *)((unsi= gned long) (x) +=20 >>> va_pa_offset)) >>> -#define __va_to_pa_nodebug(x)=C2=A0=C2=A0=C2=A0 ((unsigned long)(x) = - va_pa_offset) >>> +#define linear_mapping_va_to_pa(x)=C2=A0=C2=A0=C2=A0 ((unsigned long= )(x) -=20 >>> va_pa_offset) >>> +#define kernel_mapping_va_to_pa(x)=C2=A0=C2=A0=C2=A0 \ >>> +=C2=A0=C2=A0=C2=A0 ((unsigned long)(x) - va_kernel_pa_offset) >>> +#define __va_to_pa_nodebug(x)=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0 \ >>> +=C2=A0=C2=A0=C2=A0 (((x) >=3D PAGE_OFFSET) ?=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0 \ >>> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 linear_mapping_va_to_pa(x= ) : kernel_mapping_va_to_pa(x)) >>> >>> =C2=A0#ifdef CONFIG_DEBUG_VIRTUAL >>> =C2=A0extern phys_addr_t __virt_to_phys(unsigned long x); >>> diff --git a/arch/riscv/include/asm/pgtable.h=20 >>> b/arch/riscv/include/asm/pgtable.h >>> index 35b60035b6b0..94ef3b49dfb6 100644 >>> --- a/arch/riscv/include/asm/pgtable.h >>> +++ b/arch/riscv/include/asm/pgtable.h >>> @@ -11,23 +11,29 @@ >>> >>> =C2=A0#include >>> >>> -#ifndef __ASSEMBLY__ >>> - >>> -/* Page Upper Directory not used in RISC-V */ >>> -#include >>> -#include >>> -#include >>> -#include >>> - >>> -#ifdef CONFIG_MMU >>> +#ifndef CONFIG_MMU >>> +#define KERNEL_VIRT_ADDR=C2=A0=C2=A0=C2=A0 PAGE_OFFSET >>> +#define KERNEL_LINK_ADDR=C2=A0=C2=A0=C2=A0 PAGE_OFFSET >>> +#else >>> +/* >>> + * Leave 2GB for modules and BPF that must lie within a 2GB range=20 >>> around >>> + * the kernel. >>> + */ >>> +#define KERNEL_VIRT_ADDR=C2=A0=C2=A0=C2=A0 (VMALLOC_END - SZ_2G + 1) >>> +#define KERNEL_LINK_ADDR=C2=A0=C2=A0=C2=A0 KERNEL_VIRT_ADDR >> >> At a bare minimum this is going to make a mess of the 32-bit port, as >> non-relocatable kernels are now going to get linked at 1GiB which is=20 >> where user >> code is supposed to live.=C2=A0 That's an easy fix, though, as the 32-= bit=20 >> stuff >> doesn't need any module address restrictions. >=20 > Indeed, I will take a look at that. >=20 >> >>> =C2=A0#define VMALLOC_SIZE=C2=A0=C2=A0=C2=A0=C2=A0 (KERN_VIRT_SIZE >>= 1) >>> =C2=A0#define VMALLOC_END=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 (PAGE_OFFSET = - 1) >>> =C2=A0#define VMALLOC_START=C2=A0=C2=A0=C2=A0 (PAGE_OFFSET - VMALLOC_= SIZE) >>> >>> =C2=A0#define BPF_JIT_REGION_SIZE=C2=A0=C2=A0=C2=A0 (SZ_128M) >>> -#define BPF_JIT_REGION_START=C2=A0=C2=A0=C2=A0 (PAGE_OFFSET - BPF_JI= T_REGION_SIZE) >>> -#define BPF_JIT_REGION_END=C2=A0=C2=A0=C2=A0 (VMALLOC_END) >>> +#define BPF_JIT_REGION_START=C2=A0=C2=A0=C2=A0 PFN_ALIGN((unsigned l= ong)&_end) >>> +#define BPF_JIT_REGION_END=C2=A0=C2=A0=C2=A0 (BPF_JIT_REGION_START += =20 >>> BPF_JIT_REGION_SIZE) >>> + >>> +#ifdef CONFIG_64BIT >>> +#define VMALLOC_MODULE_START=C2=A0=C2=A0=C2=A0 BPF_JIT_REGION_END >>> +#define VMALLOC_MODULE_END=C2=A0=C2=A0=C2=A0 (((unsigned long)&_star= t & PAGE_MASK)=20 >>> + SZ_2G) >>> +#endif >>> >>> =C2=A0/* >>> =C2=A0 * Roughly size the vmemmap space to be large enough to fit eno= ugh >>> @@ -57,9 +63,16 @@ >>> =C2=A0#define FIXADDR_SIZE=C2=A0=C2=A0=C2=A0=C2=A0 PGDIR_SIZE >>> =C2=A0#endif >>> =C2=A0#define FIXADDR_START=C2=A0=C2=A0=C2=A0 (FIXADDR_TOP - FIXADDR_= SIZE) >>> - >>> =C2=A0#endif >>> >>> +#ifndef __ASSEMBLY__ >>> + >>> +/* Page Upper Directory not used in RISC-V */ >>> +#include >>> +#include >>> +#include >>> +#include >>> + >>> =C2=A0#ifdef CONFIG_64BIT >>> =C2=A0#include >>> =C2=A0#else >>> @@ -483,6 +496,7 @@ static inline void __kernel_map_pages(struct page= =20 >>> *page, int numpages, int enabl >>> >>> =C2=A0#define kern_addr_valid(addr)=C2=A0=C2=A0 (1) /* FIXME */ >>> >>> +extern char _start[]; >>> =C2=A0extern void *dtb_early_va; >>> =C2=A0void setup_bootmem(void); >>> =C2=A0void paging_init(void); >>> diff --git a/arch/riscv/kernel/head.S b/arch/riscv/kernel/head.S >>> index 98a406474e7d..8f5bb7731327 100644 >>> --- a/arch/riscv/kernel/head.S >>> +++ b/arch/riscv/kernel/head.S >>> @@ -49,7 +49,8 @@ ENTRY(_start) >>> =C2=A0#ifdef CONFIG_MMU >>> =C2=A0relocate: >>> =C2=A0=C2=A0=C2=A0=C2=A0 /* Relocate return address */ >>> -=C2=A0=C2=A0=C2=A0 li a1, PAGE_OFFSET >>> +=C2=A0=C2=A0=C2=A0 la a1, kernel_virt_addr >>> +=C2=A0=C2=A0=C2=A0 REG_L a1, 0(a1) >>> =C2=A0=C2=A0=C2=A0=C2=A0 la a2, _start >>> =C2=A0=C2=A0=C2=A0=C2=A0 sub a1, a1, a2 >>> =C2=A0=C2=A0=C2=A0=C2=A0 add ra, ra, a1 >>> diff --git a/arch/riscv/kernel/module.c b/arch/riscv/kernel/module.c >>> index 8bbe5dbe1341..1a8fbe05accf 100644 >>> --- a/arch/riscv/kernel/module.c >>> +++ b/arch/riscv/kernel/module.c >>> @@ -392,12 +392,10 @@ int apply_relocate_add(Elf_Shdr *sechdrs, const= =20 >>> char *strtab, >>> =C2=A0} >>> >>> =C2=A0#if defined(CONFIG_MMU) && defined(CONFIG_64BIT) >>> -#define VMALLOC_MODULE_START \ >>> -=C2=A0=C2=A0=C2=A0=C2=A0 max(PFN_ALIGN((unsigned long)&_end - SZ_2G)= , VMALLOC_START) >>> =C2=A0void *module_alloc(unsigned long size) >>> =C2=A0{ >>> =C2=A0=C2=A0=C2=A0=C2=A0 return __vmalloc_node_range(size, 1, VMALLOC= _MODULE_START, >>> -=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 VMALLOC_END, GFP_KERNEL, >>> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 VMALLOC_MODULE_END, GFP_KER= NEL, >>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 PAGE_KERNEL_EXEC, 0, = NUMA_NO_NODE, >>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 __builtin_return_addr= ess(0)); >>> =C2=A0} >>> diff --git a/arch/riscv/kernel/vmlinux.lds.S=20 >>> b/arch/riscv/kernel/vmlinux.lds.S >>> index 0339b6bbe11a..a9abde62909f 100644 >>> --- a/arch/riscv/kernel/vmlinux.lds.S >>> +++ b/arch/riscv/kernel/vmlinux.lds.S >>> @@ -4,7 +4,8 @@ >>> =C2=A0 * Copyright (C) 2017 SiFive >>> =C2=A0 */ >>> >>> -#define LOAD_OFFSET PAGE_OFFSET >>> +#include >>> +#define LOAD_OFFSET KERNEL_LINK_ADDR >>> =C2=A0#include >>> =C2=A0#include >>> =C2=A0#include >>> diff --git a/arch/riscv/mm/init.c b/arch/riscv/mm/init.c >>> index 736de6c8739f..71da78914645 100644 >>> --- a/arch/riscv/mm/init.c >>> +++ b/arch/riscv/mm/init.c >>> @@ -22,6 +22,9 @@ >>> >>> =C2=A0#include "../kernel/head.h" >>> >>> +unsigned long kernel_virt_addr =3D KERNEL_VIRT_ADDR; >>> +EXPORT_SYMBOL(kernel_virt_addr); >>> + >>> =C2=A0unsigned long empty_zero_page[PAGE_SIZE / sizeof(unsigned long)= ] >>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0 __page_aligned_bss; >>> =C2=A0EXPORT_SYMBOL(empty_zero_page); >>> @@ -178,8 +181,12 @@ void __init setup_bootmem(void) >>> =C2=A0} >>> >>> =C2=A0#ifdef CONFIG_MMU >>> +/* Offset between linear mapping virtual address and kernel load=20 >>> address */ >>> =C2=A0unsigned long va_pa_offset; >>> =C2=A0EXPORT_SYMBOL(va_pa_offset); >>> +/* Offset between kernel mapping virtual address and kernel load=20 >>> address */ >>> +unsigned long va_kernel_pa_offset; >>> +EXPORT_SYMBOL(va_kernel_pa_offset); >>> =C2=A0unsigned long pfn_base; >>> =C2=A0EXPORT_SYMBOL(pfn_base); >>> >>> @@ -271,7 +278,7 @@ static phys_addr_t __init alloc_pmd(uintptr_t va) >>> =C2=A0=C2=A0=C2=A0=C2=A0 if (mmu_enabled) >>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 return memblock_phys= _alloc(PAGE_SIZE, PAGE_SIZE); >>> >>> -=C2=A0=C2=A0=C2=A0 pmd_num =3D (va - PAGE_OFFSET) >> PGDIR_SHIFT; >>> +=C2=A0=C2=A0=C2=A0 pmd_num =3D (va - kernel_virt_addr) >> PGDIR_SHIF= T; >>> =C2=A0=C2=A0=C2=A0=C2=A0 BUG_ON(pmd_num >=3D NUM_EARLY_PMDS); >>> =C2=A0=C2=A0=C2=A0=C2=A0 return (uintptr_t)&early_pmd[pmd_num * PTRS_= PER_PMD]; >>> =C2=A0} >>> @@ -372,14 +379,30 @@ static uintptr_t __init=20 >>> best_map_size(phys_addr_t base, phys_addr_t size) >>> =C2=A0#error "setup_vm() is called from head.S before relocate so it=20 >>> should not use absolute addressing." >>> =C2=A0#endif >>> >>> +static uintptr_t load_pa, load_sz; >>> + >>> +static void __init create_kernel_page_table(pgd_t *pgdir, uintptr_t=20 >>> map_size) >>> +{ >>> +=C2=A0=C2=A0=C2=A0 uintptr_t va, end_va; >>> + >>> +=C2=A0=C2=A0=C2=A0 end_va =3D kernel_virt_addr + load_sz; >>> +=C2=A0=C2=A0=C2=A0 for (va =3D kernel_virt_addr; va < end_va; va +=3D= map_size) >>> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 create_pgd_mapping(pgdir,= va, >>> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 load_pa + (va - kernel_virt_addr)= , >>> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 map_size, PAGE_KERNEL_EXEC); >>> +} >>> + >>> =C2=A0asmlinkage void __init setup_vm(uintptr_t dtb_pa) >>> =C2=A0{ >>> =C2=A0=C2=A0=C2=A0=C2=A0 uintptr_t va, end_va; >>> -=C2=A0=C2=A0=C2=A0 uintptr_t load_pa =3D (uintptr_t)(&_start); >>> -=C2=A0=C2=A0=C2=A0 uintptr_t load_sz =3D (uintptr_t)(&_end) - load_p= a; >>> =C2=A0=C2=A0=C2=A0=C2=A0 uintptr_t map_size =3D best_map_size(load_pa= ,=20 >>> MAX_EARLY_MAPPING_SIZE); >>> >>> +=C2=A0=C2=A0=C2=A0 load_pa =3D (uintptr_t)(&_start); >>> +=C2=A0=C2=A0=C2=A0 load_sz =3D (uintptr_t)(&_end) - load_pa; >>> + >>> =C2=A0=C2=A0=C2=A0=C2=A0 va_pa_offset =3D PAGE_OFFSET - load_pa; >>> +=C2=A0=C2=A0=C2=A0 va_kernel_pa_offset =3D kernel_virt_addr - load_p= a; >>> + >>> =C2=A0=C2=A0=C2=A0=C2=A0 pfn_base =3D PFN_DOWN(load_pa); >>> >>> =C2=A0=C2=A0=C2=A0=C2=A0 /* >>> @@ -402,26 +425,22 @@ asmlinkage void __init setup_vm(uintptr_t dtb_p= a) >>> =C2=A0=C2=A0=C2=A0=C2=A0 create_pmd_mapping(fixmap_pmd, FIXADDR_START= , >>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0 (uintptr_t)fixmap_pte, PMD_SIZE, PAGE_TABLE); >>> =C2=A0=C2=A0=C2=A0=C2=A0 /* Setup trampoline PGD and PMD */ >>> -=C2=A0=C2=A0=C2=A0 create_pgd_mapping(trampoline_pg_dir, PAGE_OFFSET= , >>> +=C2=A0=C2=A0=C2=A0 create_pgd_mapping(trampoline_pg_dir, kernel_virt= _addr, >>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0 (uintptr_t)trampoline_pmd, PGDIR_SIZE, PAGE_TABLE); >>> -=C2=A0=C2=A0=C2=A0 create_pmd_mapping(trampoline_pmd, PAGE_OFFSET, >>> +=C2=A0=C2=A0=C2=A0 create_pmd_mapping(trampoline_pmd, kernel_virt_ad= dr, >>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0 load_pa, PMD_SIZE, PAGE_KERNEL_EXEC); >>> =C2=A0#else >>> =C2=A0=C2=A0=C2=A0=C2=A0 /* Setup trampoline PGD */ >>> -=C2=A0=C2=A0=C2=A0 create_pgd_mapping(trampoline_pg_dir, PAGE_OFFSET= , >>> +=C2=A0=C2=A0=C2=A0 create_pgd_mapping(trampoline_pg_dir, kernel_virt= _addr, >>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0 load_pa, PGDIR_SIZE, PAGE_KERNEL_EXEC); >>> =C2=A0#endif >>> >>> =C2=A0=C2=A0=C2=A0=C2=A0 /* >>> -=C2=A0=C2=A0=C2=A0=C2=A0 * Setup early PGD covering entire kernel wh= ich will allows >>> +=C2=A0=C2=A0=C2=A0=C2=A0 * Setup early PGD covering entire kernel wh= ich will allow >>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 * us to reach paging_init(). We map al= l memory banks later >>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 * in setup_vm_final() below. >>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 */ >>> -=C2=A0=C2=A0=C2=A0 end_va =3D PAGE_OFFSET + load_sz; >>> -=C2=A0=C2=A0=C2=A0 for (va =3D PAGE_OFFSET; va < end_va; va +=3D map= _size) >>> -=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 create_pgd_mapping(early_= pg_dir, va, >>> -=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 load_pa + (va - PAGE_OFFSET), >>> -=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 map_size, PAGE_KERNEL_EXEC); >>> +=C2=A0=C2=A0=C2=A0 create_kernel_page_table(early_pg_dir, map_size); >>> >>> =C2=A0=C2=A0=C2=A0=C2=A0 /* Create fixed mapping for early FDT parsin= g */ >>> =C2=A0=C2=A0=C2=A0=C2=A0 end_va =3D __fix_to_virt(FIX_FDT) + FIX_FDT_= SIZE; >>> @@ -441,6 +460,7 @@ static void __init setup_vm_final(void) >>> =C2=A0=C2=A0=C2=A0=C2=A0 uintptr_t va, map_size; >>> =C2=A0=C2=A0=C2=A0=C2=A0 phys_addr_t pa, start, end; >>> =C2=A0=C2=A0=C2=A0=C2=A0 struct memblock_region *reg; >>> +=C2=A0=C2=A0=C2=A0 static struct vm_struct vm_kernel =3D { 0 }; >>> >>> =C2=A0=C2=A0=C2=A0=C2=A0 /* Set mmu_enabled flag */ >>> =C2=A0=C2=A0=C2=A0=C2=A0 mmu_enabled =3D true; >>> @@ -467,10 +487,22 @@ static void __init setup_vm_final(void) >>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 for (pa =3D start; p= a < end; pa +=3D map_size) { >>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0 va =3D (uintptr_t)__va(pa); >>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0 create_pgd_mapping(swapper_pg_dir, va, pa, >>> -=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 map_size,= PAGE_KERNEL_EXEC); >>> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 map_size,= PAGE_KERNEL); >>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 } >>> =C2=A0=C2=A0=C2=A0=C2=A0 } >>> >>> +=C2=A0=C2=A0=C2=A0 /* Map the kernel */ >>> +=C2=A0=C2=A0=C2=A0 create_kernel_page_table(swapper_pg_dir, PMD_SIZE= ); >>> + >>> +=C2=A0=C2=A0=C2=A0 /* Reserve the vmalloc area occupied by the kerne= l */ >>> +=C2=A0=C2=A0=C2=A0 vm_kernel.addr =3D (void *)kernel_virt_addr; >>> +=C2=A0=C2=A0=C2=A0 vm_kernel.phys_addr =3D load_pa; >>> +=C2=A0=C2=A0=C2=A0 vm_kernel.size =3D (load_sz + PMD_SIZE - 1) & ~(P= MD_SIZE - 1); >>> +=C2=A0=C2=A0=C2=A0 vm_kernel.flags =3D VM_MAP | VM_NO_GUARD; >>> +=C2=A0=C2=A0=C2=A0 vm_kernel.caller =3D __builtin_return_address(0); >>> + >>> +=C2=A0=C2=A0=C2=A0 vm_area_add_early(&vm_kernel); >>> + >>> =C2=A0=C2=A0=C2=A0=C2=A0 /* Clear fixmap PTE and PMD mappings */ >>> =C2=A0=C2=A0=C2=A0=C2=A0 clear_fixmap(FIX_PTE); >>> =C2=A0=C2=A0=C2=A0=C2=A0 clear_fixmap(FIX_PMD); >>> diff --git a/arch/riscv/mm/physaddr.c b/arch/riscv/mm/physaddr.c >>> index e8e4dcd39fed..35703d5ef5fd 100644 >>> --- a/arch/riscv/mm/physaddr.c >>> +++ b/arch/riscv/mm/physaddr.c >>> @@ -23,7 +23,7 @@ EXPORT_SYMBOL(__virt_to_phys); >>> >>> =C2=A0phys_addr_t __phys_addr_symbol(unsigned long x) >>> =C2=A0{ >>> -=C2=A0=C2=A0=C2=A0 unsigned long kernel_start =3D (unsigned long)PAG= E_OFFSET; >>> +=C2=A0=C2=A0=C2=A0 unsigned long kernel_start =3D (unsigned long)ker= nel_virt_addr; >>> =C2=A0=C2=A0=C2=A0=C2=A0 unsigned long kernel_end =3D (unsigned long)= _end; >>> >>> =C2=A0=C2=A0=C2=A0=C2=A0 /* >=20 > Alex