From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.2 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI,NICE_REPLY_A, SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_SANE_1 autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 06438C433E3 for ; Thu, 23 Jul 2020 05:32:57 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 9A37020768 for ; Thu, 23 Jul 2020 05:32:56 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 9A37020768 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=ghiti.fr Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 3F27B6B0007; Thu, 23 Jul 2020 01:32:56 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 3A10E6B0008; Thu, 23 Jul 2020 01:32:56 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 290C36B000A; Thu, 23 Jul 2020 01:32:56 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0156.hostedemail.com [216.40.44.156]) by kanga.kvack.org (Postfix) with ESMTP id 122BA6B0007 for ; Thu, 23 Jul 2020 01:32:56 -0400 (EDT) Received: from smtpin14.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id C8E80824934B for ; Thu, 23 Jul 2020 05:32:55 +0000 (UTC) X-FDA: 77068221510.14.peace52_4d00b9d26f3c Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin14.hostedemail.com (Postfix) with ESMTP id 92CF61806E794 for ; Thu, 23 Jul 2020 05:32:55 +0000 (UTC) X-HE-Tag: peace52_4d00b9d26f3c X-Filterd-Recvd-Size: 22843 Received: from relay11.mail.gandi.net (relay11.mail.gandi.net [217.70.178.231]) by imf47.hostedemail.com (Postfix) with ESMTP for ; Thu, 23 Jul 2020 05:32:54 +0000 (UTC) Received: from [192.168.1.14] (lfbn-gre-1-325-105.w90-112.abo.wanadoo.fr [90.112.45.105]) (Authenticated sender: alex@ghiti.fr) by relay11.mail.gandi.net (Postfix) with ESMTPSA id 022C0100003; Thu, 23 Jul 2020 05:32:46 +0000 (UTC) Subject: Re: [PATCH v5 1/4] riscv: Move kernel mapping to vmalloc zone To: Palmer Dabbelt Cc: mpe@ellerman.id.au, benh@kernel.crashing.org, paulus@samba.org, Paul Walmsley , aou@eecs.berkeley.edu, Anup Patel , Atish Patra , zong.li@sifive.com, linux-kernel@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, linux-riscv@lists.infradead.org, linux-mm@kvack.org References: From: Alex Ghiti Message-ID: Date: Thu, 23 Jul 2020 01:32:46 -0400 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.10.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: fr X-Rspamd-Queue-Id: 92CF61806E794 X-Spamd-Result: default: False [0.00 / 100.00] X-Rspamd-Server: rspam03 Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Hi Palmer, Le 7/21/20 =C3=A0 3:05 PM, Palmer Dabbelt a =C3=A9crit=C2=A0: > On Tue, 21 Jul 2020 11:36:10 PDT (-0700), alex@ghiti.fr wrote: >> Let's try to make progress here: I add linux-mm in CC to get feedback = on >> this patch as it blocks sv48 support too. >=20 > Sorry for being slow here.=C2=A0 I haven't replied because I hadn't rea= lly=20 > fleshed No problem :) > out the design yet, but just so everyone's on the same page my problems= =20 > with > this are: >=20 > * We waste vmalloc space on 32-bit systems, where there isn't a lot of = it. > * On 64-bit systems the VA space around the kernel is precious because=20 > it's the > =C2=A0only place we can place text (modules, BPF, whatever).=C2=A0 If = we start=20 > putting > =C2=A0the kernel in the vmalloc space then we either have to pre-alloc= ate a=20 > bunch > =C2=A0of space around it (essentially making it a fixed mapping anyway= ) or it > =C2=A0becomes likely that we won't be able to find space for modules a= s they're > =C2=A0loaded into running systems. Let's note that we already have this issue for BPF and modules right now. But by keeping the kernel 'in the end' of the vmalloc region, that's=20 quite mitigate this problem: if we exhaust the vmalloc region in 64bit=20 and then start allocating here, I think the whole system will have other=20 problem. > * Relying on a relocatable kernel for sv48 support introduces a fairly=20 > large > =C2=A0performance hit. I understand the performance penalty but I struggle to it "fairly=20 large": can we benchmark this somehow ? >=20 > Roughly, my proposal would be to: >=20 > * Leave the 32-bit memory map alone.=C2=A0 On 32-bit systems we can loa= d modules > =C2=A0anywhere and we only have one VA width, so we're not really solv= ing any > =C2=A0problems with these changes. Ok that's possible although a lot of ifdef will get involved :) > * Staticly allocate a 2GiB portion of the VA space for all our text, as= =20 > its own > =C2=A0region.=C2=A0 We'd link/relocate the kernel here instead of arou= nd=20 > PAGE_OFFSET, > =C2=A0which would decouple the kernel from the physical memory layout = of the=20 > system. > =C2=A0This would have the side effect of sorting out a bunch of bootlo= ader=20 > headaches > =C2=A0that we currently have. This amounts to doing the same as this patch but instead of using the=20 vmalloc region, we'd use our own right ? I believe we'd then lose the=20 vmalloc facilities to allocate modules around this zone. > * Sort out how to maintain a linear map as the canonical hole moves aro= und > =C2=A0between the VA widths without adding a bunch of overhead to the=20 > virt2phys and > =C2=A0friends.=C2=A0 This is probably going to be the trickiest part, = but I think=20 > if we > =C2=A0just change the page table code to essentially lie about VAs whe= n an sv39 > =C2=A0system runs an sv48+sv39 kernel we could make it work -- there'd= be some > =C2=A0logical complexity involved, but it would remain fast. I have to think about that. >=20 > This doesn't solve the problem of virtually relocatable kernels, but it= =20 > does > let us decouple that from the sv48 stuff.=C2=A0 It also lets us stop re= lying=20 > on a > fixed physical address the kernel is loaded into, which is another thin= g I > don't like. >=20 Agreed on this one. > I know this may be a more complicated approach, but there aren't any sv= 48 > systems around right now so I just don't see the rush to support them, > particularly when there's a cost to what already exists (for those who=20 > haven't > been watching, so far all the sv48 patch sets have imposed a significan= t > performance penalty on all systems). >=20 Alex >> >> Alex >> >> Le 7/9/20 =C3=A0 7:11 AM, Alex Ghiti a =C3=A9crit=C2=A0: >>> Hi Palmer, >>> >>> Le 7/9/20 =C3=A0 1:05 AM, Palmer Dabbelt a =C3=A9crit=C2=A0: >>>> On Sun, 07 Jun 2020 00:59:46 PDT (-0700), alex@ghiti.fr wrote: >>>>> This is a preparatory patch for relocatable kernel. >>>>> >>>>> The kernel used to be linked at PAGE_OFFSET address and used to be >>>>> loaded >>>>> physically at the beginning of the main memory. Therefore, we could= =20 >>>>> use >>>>> the linear mapping for the kernel mapping. >>>>> >>>>> But the relocated kernel base address will be different from=20 >>>>> PAGE_OFFSET >>>>> and since in the linear mapping, two different virtual addresses=20 >>>>> cannot >>>>> point to the same physical address, the kernel mapping needs to lie >>>>> outside >>>>> the linear mapping. >>>> >>>> I know it's been a while, but I keep opening this up to review it an= d >>>> just >>>> can't get over how ugly it is to put the kernel's linear map in the >>>> vmalloc >>>> region. >>>> >>>> I guess I don't understand why this is necessary at all. >>>> Specifically: why >>>> can't we just relocate the kernel within the linear map?=C2=A0 That = would >>>> let the >>>> bootloader put the kernel wherever it wants, modulo the physical >>>> memory size we >>>> support.=C2=A0 We'd need to handle the regions that are coupled to t= he >>>> kernel's >>>> execution address, but we could just put them in an explicit memory >>>> region >>>> which is what we should probably be doing anyway. >>> >>> Virtual relocation in the linear mapping requires to move the kernel >>> physically too. Zong implemented this physical move in its KASLR RFC >>> patchset, which is cumbersome since finding an available physical spo= t >>> is harder than just selecting a virtual range in the vmalloc range. >>> >>> In addition, having the kernel mapping in the linear mapping prevents >>> the use of hugepage for the linear mapping resulting in performance l= oss >>> (at least for the GB that encompasses the kernel). >>> >>> Why do you find this "ugly" ? The vmalloc region is just a bunch of >>> available virtual addresses to whatever purpose we want, and as noted= by >>> Zong, arm64 uses the same scheme. >>> >>>> >>>>> In addition, because modules and BPF must be close to the kernel=20 >>>>> (inside >>>>> +-2GB window), the kernel is placed at the end of the vmalloc zone=20 >>>>> minus >>>>> 2GB, which leaves room for modules and BPF. The kernel could not be >>>>> placed at the beginning of the vmalloc zone since other vmalloc >>>>> allocations from the kernel could get all the +-2GB window around t= he >>>>> kernel which would prevent new modules and BPF programs to be loade= d. >>>> >>>> Well, that's not enough to make sure this doesn't happen -- it's jus= t >>>> enough to >>>> make sure it doesn't happen very quickily.=C2=A0 That's the same boa= t we're >>>> already >>>> in, though, so it's not like it's worse. >>> >>> Indeed, that's not worse, I haven't found a way to reserve vmalloc ar= ea >>> without actually allocating it. >>> >>>> >>>>> Signed-off-by: Alexandre Ghiti >>>>> Reviewed-by: Zong Li >>>>> --- >>>>> =C2=A0arch/riscv/boot/loader.lds.S=C2=A0=C2=A0=C2=A0=C2=A0 |=C2=A0 = 3 +- >>>>> =C2=A0arch/riscv/include/asm/page.h=C2=A0=C2=A0=C2=A0 | 10 +++++- >>>>> =C2=A0arch/riscv/include/asm/pgtable.h | 38 ++++++++++++++------- >>>>> =C2=A0arch/riscv/kernel/head.S=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0 |=C2=A0 3 +- >>>>> =C2=A0arch/riscv/kernel/module.c=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= |=C2=A0 4 +-- >>>>> =C2=A0arch/riscv/kernel/vmlinux.lds.S=C2=A0 |=C2=A0 3 +- >>>>> =C2=A0arch/riscv/mm/init.c=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 | 58=20 >>>>> +++++++++++++++++++++++++------- >>>>> =C2=A0arch/riscv/mm/physaddr.c=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0 |=C2=A0 2 +- >>>>> =C2=A08 files changed, 88 insertions(+), 33 deletions(-) >>>>> >>>>> diff --git a/arch/riscv/boot/loader.lds.S=20 >>>>> b/arch/riscv/boot/loader.lds.S >>>>> index 47a5003c2e28..62d94696a19c 100644 >>>>> --- a/arch/riscv/boot/loader.lds.S >>>>> +++ b/arch/riscv/boot/loader.lds.S >>>>> @@ -1,13 +1,14 @@ >>>>> =C2=A0/* SPDX-License-Identifier: GPL-2.0 */ >>>>> >>>>> =C2=A0#include >>>>> +#include >>>>> >>>>> =C2=A0OUTPUT_ARCH(riscv) >>>>> =C2=A0ENTRY(_start) >>>>> >>>>> =C2=A0SECTIONS >>>>> =C2=A0{ >>>>> -=C2=A0=C2=A0=C2=A0 . =3D PAGE_OFFSET; >>>>> +=C2=A0=C2=A0=C2=A0 . =3D KERNEL_LINK_ADDR; >>>>> >>>>> =C2=A0=C2=A0=C2=A0=C2=A0 .payload : { >>>>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 *(.payload) >>>>> diff --git a/arch/riscv/include/asm/page.h >>>>> b/arch/riscv/include/asm/page.h >>>>> index 2d50f76efe48..48bb09b6a9b7 100644 >>>>> --- a/arch/riscv/include/asm/page.h >>>>> +++ b/arch/riscv/include/asm/page.h >>>>> @@ -90,18 +90,26 @@ typedef struct page *pgtable_t; >>>>> >>>>> =C2=A0#ifdef CONFIG_MMU >>>>> =C2=A0extern unsigned long va_pa_offset; >>>>> +extern unsigned long va_kernel_pa_offset; >>>>> =C2=A0extern unsigned long pfn_base; >>>>> =C2=A0#define ARCH_PFN_OFFSET=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0 (pfn_base) >>>>> =C2=A0#else >>>>> =C2=A0#define va_pa_offset=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= 0 >>>>> +#define va_kernel_pa_offset=C2=A0=C2=A0=C2=A0 0 >>>>> =C2=A0#define ARCH_PFN_OFFSET=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0 (PAGE_OFFSET >> PAGE_SHIFT) >>>>> =C2=A0#endif /* CONFIG_MMU */ >>>>> >>>>> =C2=A0extern unsigned long max_low_pfn; >>>>> =C2=A0extern unsigned long min_low_pfn; >>>>> +extern unsigned long kernel_virt_addr; >>>>> >>>>> =C2=A0#define __pa_to_va_nodebug(x)=C2=A0=C2=A0=C2=A0 ((void *)((un= signed long) (x) + >>>>> va_pa_offset)) >>>>> -#define __va_to_pa_nodebug(x)=C2=A0=C2=A0=C2=A0 ((unsigned long)(x= ) - va_pa_offset) >>>>> +#define linear_mapping_va_to_pa(x)=C2=A0=C2=A0=C2=A0 ((unsigned lo= ng)(x) - >>>>> va_pa_offset) >>>>> +#define kernel_mapping_va_to_pa(x)=C2=A0=C2=A0=C2=A0 \ >>>>> +=C2=A0=C2=A0=C2=A0 ((unsigned long)(x) - va_kernel_pa_offset) >>>>> +#define __va_to_pa_nodebug(x)=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0 \ >>>>> +=C2=A0=C2=A0=C2=A0 (((x) >=3D PAGE_OFFSET) ?=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0 \ >>>>> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 linear_mapping_va_to_pa= (x) : kernel_mapping_va_to_pa(x)) >>>>> >>>>> =C2=A0#ifdef CONFIG_DEBUG_VIRTUAL >>>>> =C2=A0extern phys_addr_t __virt_to_phys(unsigned long x); >>>>> diff --git a/arch/riscv/include/asm/pgtable.h >>>>> b/arch/riscv/include/asm/pgtable.h >>>>> index 35b60035b6b0..94ef3b49dfb6 100644 >>>>> --- a/arch/riscv/include/asm/pgtable.h >>>>> +++ b/arch/riscv/include/asm/pgtable.h >>>>> @@ -11,23 +11,29 @@ >>>>> >>>>> =C2=A0#include >>>>> >>>>> -#ifndef __ASSEMBLY__ >>>>> - >>>>> -/* Page Upper Directory not used in RISC-V */ >>>>> -#include >>>>> -#include >>>>> -#include >>>>> -#include >>>>> - >>>>> -#ifdef CONFIG_MMU >>>>> +#ifndef CONFIG_MMU >>>>> +#define KERNEL_VIRT_ADDR=C2=A0=C2=A0=C2=A0 PAGE_OFFSET >>>>> +#define KERNEL_LINK_ADDR=C2=A0=C2=A0=C2=A0 PAGE_OFFSET >>>>> +#else >>>>> +/* >>>>> + * Leave 2GB for modules and BPF that must lie within a 2GB range >>>>> around >>>>> + * the kernel. >>>>> + */ >>>>> +#define KERNEL_VIRT_ADDR=C2=A0=C2=A0=C2=A0 (VMALLOC_END - SZ_2G + = 1) >>>>> +#define KERNEL_LINK_ADDR=C2=A0=C2=A0=C2=A0 KERNEL_VIRT_ADDR >>>> >>>> At a bare minimum this is going to make a mess of the 32-bit port, a= s >>>> non-relocatable kernels are now going to get linked at 1GiB which is >>>> where user >>>> code is supposed to live.=C2=A0 That's an easy fix, though, as the 3= 2-bit >>>> stuff >>>> doesn't need any module address restrictions. >>> >>> Indeed, I will take a look at that. >>> >>>> >>>>> =C2=A0#define VMALLOC_SIZE=C2=A0=C2=A0=C2=A0=C2=A0 (KERN_VIRT_SIZE = >> 1) >>>>> =C2=A0#define VMALLOC_END=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 (PAGE_OFFSE= T - 1) >>>>> =C2=A0#define VMALLOC_START=C2=A0=C2=A0=C2=A0 (PAGE_OFFSET - VMALLO= C_SIZE) >>>>> >>>>> =C2=A0#define BPF_JIT_REGION_SIZE=C2=A0=C2=A0=C2=A0 (SZ_128M) >>>>> -#define BPF_JIT_REGION_START=C2=A0=C2=A0=C2=A0 (PAGE_OFFSET - BPF_= JIT_REGION_SIZE) >>>>> -#define BPF_JIT_REGION_END=C2=A0=C2=A0=C2=A0 (VMALLOC_END) >>>>> +#define BPF_JIT_REGION_START=C2=A0=C2=A0=C2=A0 PFN_ALIGN((unsigned= long)&_end) >>>>> +#define BPF_JIT_REGION_END=C2=A0=C2=A0=C2=A0 (BPF_JIT_REGION_START= + >>>>> BPF_JIT_REGION_SIZE) >>>>> + >>>>> +#ifdef CONFIG_64BIT >>>>> +#define VMALLOC_MODULE_START=C2=A0=C2=A0=C2=A0 BPF_JIT_REGION_END >>>>> +#define VMALLOC_MODULE_END=C2=A0=C2=A0=C2=A0 (((unsigned long)&_st= art & PAGE_MASK) >>>>> + SZ_2G) >>>>> +#endif >>>>> >>>>> =C2=A0/* >>>>> =C2=A0 * Roughly size the vmemmap space to be large enough to fit e= nough >>>>> @@ -57,9 +63,16 @@ >>>>> =C2=A0#define FIXADDR_SIZE=C2=A0=C2=A0=C2=A0=C2=A0 PGDIR_SIZE >>>>> =C2=A0#endif >>>>> =C2=A0#define FIXADDR_START=C2=A0=C2=A0=C2=A0 (FIXADDR_TOP - FIXADD= R_SIZE) >>>>> - >>>>> =C2=A0#endif >>>>> >>>>> +#ifndef __ASSEMBLY__ >>>>> + >>>>> +/* Page Upper Directory not used in RISC-V */ >>>>> +#include >>>>> +#include >>>>> +#include >>>>> +#include >>>>> + >>>>> =C2=A0#ifdef CONFIG_64BIT >>>>> =C2=A0#include >>>>> =C2=A0#else >>>>> @@ -483,6 +496,7 @@ static inline void __kernel_map_pages(struct pa= ge >>>>> *page, int numpages, int enabl >>>>> >>>>> =C2=A0#define kern_addr_valid(addr)=C2=A0=C2=A0 (1) /* FIXME */ >>>>> >>>>> +extern char _start[]; >>>>> =C2=A0extern void *dtb_early_va; >>>>> =C2=A0void setup_bootmem(void); >>>>> =C2=A0void paging_init(void); >>>>> diff --git a/arch/riscv/kernel/head.S b/arch/riscv/kernel/head.S >>>>> index 98a406474e7d..8f5bb7731327 100644 >>>>> --- a/arch/riscv/kernel/head.S >>>>> +++ b/arch/riscv/kernel/head.S >>>>> @@ -49,7 +49,8 @@ ENTRY(_start) >>>>> =C2=A0#ifdef CONFIG_MMU >>>>> =C2=A0relocate: >>>>> =C2=A0=C2=A0=C2=A0=C2=A0 /* Relocate return address */ >>>>> -=C2=A0=C2=A0=C2=A0 li a1, PAGE_OFFSET >>>>> +=C2=A0=C2=A0=C2=A0 la a1, kernel_virt_addr >>>>> +=C2=A0=C2=A0=C2=A0 REG_L a1, 0(a1) >>>>> =C2=A0=C2=A0=C2=A0=C2=A0 la a2, _start >>>>> =C2=A0=C2=A0=C2=A0=C2=A0 sub a1, a1, a2 >>>>> =C2=A0=C2=A0=C2=A0=C2=A0 add ra, ra, a1 >>>>> diff --git a/arch/riscv/kernel/module.c b/arch/riscv/kernel/module.= c >>>>> index 8bbe5dbe1341..1a8fbe05accf 100644 >>>>> --- a/arch/riscv/kernel/module.c >>>>> +++ b/arch/riscv/kernel/module.c >>>>> @@ -392,12 +392,10 @@ int apply_relocate_add(Elf_Shdr *sechdrs, con= st >>>>> char *strtab, >>>>> =C2=A0} >>>>> >>>>> =C2=A0#if defined(CONFIG_MMU) && defined(CONFIG_64BIT) >>>>> -#define VMALLOC_MODULE_START \ >>>>> -=C2=A0=C2=A0=C2=A0=C2=A0 max(PFN_ALIGN((unsigned long)&_end - SZ_2= G), VMALLOC_START) >>>>> =C2=A0void *module_alloc(unsigned long size) >>>>> =C2=A0{ >>>>> =C2=A0=C2=A0=C2=A0=C2=A0 return __vmalloc_node_range(size, 1, VMALL= OC_MODULE_START, >>>>> -=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 VMALLOC_END, GFP_KERNEL, >>>>> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 VMALLOC_MODULE_END, GFP_= KERNEL, >>>>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 PAGE_KERNEL_EXEC, 0, = NUMA_NO_NODE, >>>>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 __builtin_return_addr= ess(0)); >>>>> =C2=A0} >>>>> diff --git a/arch/riscv/kernel/vmlinux.lds.S >>>>> b/arch/riscv/kernel/vmlinux.lds.S >>>>> index 0339b6bbe11a..a9abde62909f 100644 >>>>> --- a/arch/riscv/kernel/vmlinux.lds.S >>>>> +++ b/arch/riscv/kernel/vmlinux.lds.S >>>>> @@ -4,7 +4,8 @@ >>>>> =C2=A0 * Copyright (C) 2017 SiFive >>>>> =C2=A0 */ >>>>> >>>>> -#define LOAD_OFFSET PAGE_OFFSET >>>>> +#include >>>>> +#define LOAD_OFFSET KERNEL_LINK_ADDR >>>>> =C2=A0#include >>>>> =C2=A0#include >>>>> =C2=A0#include >>>>> diff --git a/arch/riscv/mm/init.c b/arch/riscv/mm/init.c >>>>> index 736de6c8739f..71da78914645 100644 >>>>> --- a/arch/riscv/mm/init.c >>>>> +++ b/arch/riscv/mm/init.c >>>>> @@ -22,6 +22,9 @@ >>>>> >>>>> =C2=A0#include "../kernel/head.h" >>>>> >>>>> +unsigned long kernel_virt_addr =3D KERNEL_VIRT_ADDR; >>>>> +EXPORT_SYMBOL(kernel_virt_addr); >>>>> + >>>>> =C2=A0unsigned long empty_zero_page[PAGE_SIZE / sizeof(unsigned lon= g)] >>>>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0 __page_aligned_bss; >>>>> =C2=A0EXPORT_SYMBOL(empty_zero_page); >>>>> @@ -178,8 +181,12 @@ void __init setup_bootmem(void) >>>>> =C2=A0} >>>>> >>>>> =C2=A0#ifdef CONFIG_MMU >>>>> +/* Offset between linear mapping virtual address and kernel load >>>>> address */ >>>>> =C2=A0unsigned long va_pa_offset; >>>>> =C2=A0EXPORT_SYMBOL(va_pa_offset); >>>>> +/* Offset between kernel mapping virtual address and kernel load >>>>> address */ >>>>> +unsigned long va_kernel_pa_offset; >>>>> +EXPORT_SYMBOL(va_kernel_pa_offset); >>>>> =C2=A0unsigned long pfn_base; >>>>> =C2=A0EXPORT_SYMBOL(pfn_base); >>>>> >>>>> @@ -271,7 +278,7 @@ static phys_addr_t __init alloc_pmd(uintptr_t v= a) >>>>> =C2=A0=C2=A0=C2=A0=C2=A0 if (mmu_enabled) >>>>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 return memblock_ph= ys_alloc(PAGE_SIZE, PAGE_SIZE); >>>>> >>>>> -=C2=A0=C2=A0=C2=A0 pmd_num =3D (va - PAGE_OFFSET) >> PGDIR_SHIFT; >>>>> +=C2=A0=C2=A0=C2=A0 pmd_num =3D (va - kernel_virt_addr) >> PGDIR_SH= IFT; >>>>> =C2=A0=C2=A0=C2=A0=C2=A0 BUG_ON(pmd_num >=3D NUM_EARLY_PMDS); >>>>> =C2=A0=C2=A0=C2=A0=C2=A0 return (uintptr_t)&early_pmd[pmd_num * PTR= S_PER_PMD]; >>>>> =C2=A0} >>>>> @@ -372,14 +379,30 @@ static uintptr_t __init >>>>> best_map_size(phys_addr_t base, phys_addr_t size) >>>>> =C2=A0#error "setup_vm() is called from head.S before relocate so i= t >>>>> should not use absolute addressing." >>>>> =C2=A0#endif >>>>> >>>>> +static uintptr_t load_pa, load_sz; >>>>> + >>>>> +static void __init create_kernel_page_table(pgd_t *pgdir, uintptr_= t >>>>> map_size) >>>>> +{ >>>>> +=C2=A0=C2=A0=C2=A0 uintptr_t va, end_va; >>>>> + >>>>> +=C2=A0=C2=A0=C2=A0 end_va =3D kernel_virt_addr + load_sz; >>>>> +=C2=A0=C2=A0=C2=A0 for (va =3D kernel_virt_addr; va < end_va; va += =3D map_size) >>>>> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 create_pgd_mapping(pgdi= r, va, >>>>> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 load_pa + (va - kernel_virt_ad= dr), >>>>> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 map_size, PAGE_KERNEL_EXEC); >>>>> +} >>>>> + >>>>> =C2=A0asmlinkage void __init setup_vm(uintptr_t dtb_pa) >>>>> =C2=A0{ >>>>> =C2=A0=C2=A0=C2=A0=C2=A0 uintptr_t va, end_va; >>>>> -=C2=A0=C2=A0=C2=A0 uintptr_t load_pa =3D (uintptr_t)(&_start); >>>>> -=C2=A0=C2=A0=C2=A0 uintptr_t load_sz =3D (uintptr_t)(&_end) - load= _pa; >>>>> =C2=A0=C2=A0=C2=A0=C2=A0 uintptr_t map_size =3D best_map_size(load_= pa, >>>>> MAX_EARLY_MAPPING_SIZE); >>>>> >>>>> +=C2=A0=C2=A0=C2=A0 load_pa =3D (uintptr_t)(&_start); >>>>> +=C2=A0=C2=A0=C2=A0 load_sz =3D (uintptr_t)(&_end) - load_pa; >>>>> + >>>>> =C2=A0=C2=A0=C2=A0=C2=A0 va_pa_offset =3D PAGE_OFFSET - load_pa; >>>>> +=C2=A0=C2=A0=C2=A0 va_kernel_pa_offset =3D kernel_virt_addr - load= _pa; >>>>> + >>>>> =C2=A0=C2=A0=C2=A0=C2=A0 pfn_base =3D PFN_DOWN(load_pa); >>>>> >>>>> =C2=A0=C2=A0=C2=A0=C2=A0 /* >>>>> @@ -402,26 +425,22 @@ asmlinkage void __init setup_vm(uintptr_t=20 >>>>> dtb_pa) >>>>> =C2=A0=C2=A0=C2=A0=C2=A0 create_pmd_mapping(fixmap_pmd, FIXADDR_STA= RT, >>>>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0 (uintptr_t)fixmap_pte, PMD_SIZE, PAGE_TABLE); >>>>> =C2=A0=C2=A0=C2=A0=C2=A0 /* Setup trampoline PGD and PMD */ >>>>> -=C2=A0=C2=A0=C2=A0 create_pgd_mapping(trampoline_pg_dir, PAGE_OFFS= ET, >>>>> +=C2=A0=C2=A0=C2=A0 create_pgd_mapping(trampoline_pg_dir, kernel_vi= rt_addr, >>>>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0 (uintptr_t)trampoline_pmd, PGDIR_SIZE, PAGE_TABLE); >>>>> -=C2=A0=C2=A0=C2=A0 create_pmd_mapping(trampoline_pmd, PAGE_OFFSET, >>>>> +=C2=A0=C2=A0=C2=A0 create_pmd_mapping(trampoline_pmd, kernel_virt_= addr, >>>>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0 load_pa, PMD_SIZE, PAGE_KERNEL_EXEC); >>>>> =C2=A0#else >>>>> =C2=A0=C2=A0=C2=A0=C2=A0 /* Setup trampoline PGD */ >>>>> -=C2=A0=C2=A0=C2=A0 create_pgd_mapping(trampoline_pg_dir, PAGE_OFFS= ET, >>>>> +=C2=A0=C2=A0=C2=A0 create_pgd_mapping(trampoline_pg_dir, kernel_vi= rt_addr, >>>>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0 load_pa, PGDIR_SIZE, PAGE_KERNEL_EXEC); >>>>> =C2=A0#endif >>>>> >>>>> =C2=A0=C2=A0=C2=A0=C2=A0 /* >>>>> -=C2=A0=C2=A0=C2=A0=C2=A0 * Setup early PGD covering entire kernel = which will allows >>>>> +=C2=A0=C2=A0=C2=A0=C2=A0 * Setup early PGD covering entire kernel = which will allow >>>>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 * us to reach paging_init(). We map = all memory banks later >>>>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 * in setup_vm_final() below. >>>>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 */ >>>>> -=C2=A0=C2=A0=C2=A0 end_va =3D PAGE_OFFSET + load_sz; >>>>> -=C2=A0=C2=A0=C2=A0 for (va =3D PAGE_OFFSET; va < end_va; va +=3D m= ap_size) >>>>> -=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 create_pgd_mapping(earl= y_pg_dir, va, >>>>> -=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 load_pa + (va - PAGE_OFFSET), >>>>> -=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 map_size, PAGE_KERNEL_EXEC); >>>>> +=C2=A0=C2=A0=C2=A0 create_kernel_page_table(early_pg_dir, map_size= ); >>>>> >>>>> =C2=A0=C2=A0=C2=A0=C2=A0 /* Create fixed mapping for early FDT pars= ing */ >>>>> =C2=A0=C2=A0=C2=A0=C2=A0 end_va =3D __fix_to_virt(FIX_FDT) + FIX_FD= T_SIZE; >>>>> @@ -441,6 +460,7 @@ static void __init setup_vm_final(void) >>>>> =C2=A0=C2=A0=C2=A0=C2=A0 uintptr_t va, map_size; >>>>> =C2=A0=C2=A0=C2=A0=C2=A0 phys_addr_t pa, start, end; >>>>> =C2=A0=C2=A0=C2=A0=C2=A0 struct memblock_region *reg; >>>>> +=C2=A0=C2=A0=C2=A0 static struct vm_struct vm_kernel =3D { 0 }; >>>>> >>>>> =C2=A0=C2=A0=C2=A0=C2=A0 /* Set mmu_enabled flag */ >>>>> =C2=A0=C2=A0=C2=A0=C2=A0 mmu_enabled =3D true; >>>>> @@ -467,10 +487,22 @@ static void __init setup_vm_final(void) >>>>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 for (pa =3D start;= pa < end; pa +=3D map_size) { >>>>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0 va =3D (uintptr_t)__va(pa); >>>>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0 create_pgd_mapping(swapper_pg_dir, va, pa, >>>>> -=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 map_si= ze, PAGE_KERNEL_EXEC); >>>>> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 map_si= ze, PAGE_KERNEL); >>>>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 } >>>>> =C2=A0=C2=A0=C2=A0=C2=A0 } >>>>> >>>>> +=C2=A0=C2=A0=C2=A0 /* Map the kernel */ >>>>> +=C2=A0=C2=A0=C2=A0 create_kernel_page_table(swapper_pg_dir, PMD_SI= ZE); >>>>> + >>>>> +=C2=A0=C2=A0=C2=A0 /* Reserve the vmalloc area occupied by the ker= nel */ >>>>> +=C2=A0=C2=A0=C2=A0 vm_kernel.addr =3D (void *)kernel_virt_addr; >>>>> +=C2=A0=C2=A0=C2=A0 vm_kernel.phys_addr =3D load_pa; >>>>> +=C2=A0=C2=A0=C2=A0 vm_kernel.size =3D (load_sz + PMD_SIZE - 1) & ~= (PMD_SIZE - 1); >>>>> +=C2=A0=C2=A0=C2=A0 vm_kernel.flags =3D VM_MAP | VM_NO_GUARD; >>>>> +=C2=A0=C2=A0=C2=A0 vm_kernel.caller =3D __builtin_return_address(0= ); >>>>> + >>>>> +=C2=A0=C2=A0=C2=A0 vm_area_add_early(&vm_kernel); >>>>> + >>>>> =C2=A0=C2=A0=C2=A0=C2=A0 /* Clear fixmap PTE and PMD mappings */ >>>>> =C2=A0=C2=A0=C2=A0=C2=A0 clear_fixmap(FIX_PTE); >>>>> =C2=A0=C2=A0=C2=A0=C2=A0 clear_fixmap(FIX_PMD); >>>>> diff --git a/arch/riscv/mm/physaddr.c b/arch/riscv/mm/physaddr.c >>>>> index e8e4dcd39fed..35703d5ef5fd 100644 >>>>> --- a/arch/riscv/mm/physaddr.c >>>>> +++ b/arch/riscv/mm/physaddr.c >>>>> @@ -23,7 +23,7 @@ EXPORT_SYMBOL(__virt_to_phys); >>>>> >>>>> =C2=A0phys_addr_t __phys_addr_symbol(unsigned long x) >>>>> =C2=A0{ >>>>> -=C2=A0=C2=A0=C2=A0 unsigned long kernel_start =3D (unsigned long)P= AGE_OFFSET; >>>>> +=C2=A0=C2=A0=C2=A0 unsigned long kernel_start =3D (unsigned long)k= ernel_virt_addr; >>>>> =C2=A0=C2=A0=C2=A0=C2=A0 unsigned long kernel_end =3D (unsigned lon= g)_end; >>>>> >>>>> =C2=A0=C2=A0=C2=A0=C2=A0 /* >>> >>> Alex