From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-9.8 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI, SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2F34BC433E1 for ; Tue, 21 Jul 2020 19:06:21 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id BA94C2068F for ; Tue, 21 Jul 2020 19:06:20 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=dabbelt-com.20150623.gappssmtp.com header.i=@dabbelt-com.20150623.gappssmtp.com header.b="2FJezxvM" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org BA94C2068F Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=dabbelt.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 460C46B0002; Tue, 21 Jul 2020 15:06:20 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 410A26B0003; Tue, 21 Jul 2020 15:06:20 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 2D72F8D0001; Tue, 21 Jul 2020 15:06:20 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0226.hostedemail.com [216.40.44.226]) by kanga.kvack.org (Postfix) with ESMTP id 0FF1B6B0002 for ; Tue, 21 Jul 2020 15:06:20 -0400 (EDT) Received: from smtpin12.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id 8F16B1805B364 for ; Tue, 21 Jul 2020 19:06:19 +0000 (UTC) X-FDA: 77063013678.12.ray14_481389326f2f Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin12.hostedemail.com (Postfix) with ESMTP id 19C611813A8F3 for ; Tue, 21 Jul 2020 19:05:43 +0000 (UTC) X-HE-Tag: ray14_481389326f2f X-Filterd-Recvd-Size: 23166 Received: from mail-pj1-f67.google.com (mail-pj1-f67.google.com [209.85.216.67]) by imf39.hostedemail.com (Postfix) with ESMTP for ; Tue, 21 Jul 2020 19:05:42 +0000 (UTC) Received: by mail-pj1-f67.google.com with SMTP id k1so2006115pjt.5 for ; Tue, 21 Jul 2020 12:05:42 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=dabbelt-com.20150623.gappssmtp.com; s=20150623; h=date:subject:in-reply-to:cc:from:to:message-id:mime-version :content-transfer-encoding; bh=hdMo1qYoWLTP0HS+quxo8iCzLU73ccVkSQIkXp+qPq8=; b=2FJezxvM+9b8Xttr0n/eceW6GA62NY0QfXnChicSHLlwOkxfhota5ldbuNFmefmfBV ebrbZ+yN46o7tjV3J7AZO/7TGHm3iqlPg5XC5Q+MycLIR9M5W3PI0IR3opjjcXikzLmT b8ObaKPA1IXuXPJp0FexeQqjuun7jXud7KyG6sbsS7R7fhP1LIBQDHCdYRz+cv2+d3Wz zMslNGo3EvSCkCgQGmPBXT9So5/4gYUtoqWCHtBPz+iSoreXjcdU42i+WacAd10psnag VLV+zVj0TK9vwURD18LoJFLnV4FBig0MNZBJb1awKv/Tu+2d25iSnxaRAjcft9UoogRw QD7A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:subject:in-reply-to:cc:from:to:message-id :mime-version:content-transfer-encoding; bh=hdMo1qYoWLTP0HS+quxo8iCzLU73ccVkSQIkXp+qPq8=; b=e7uhmyI32ppMpt/0jfzIYe7sxnCzNspFzcd60lUBGhcazhEUh0IOZ2d25/y9LUatbl AbRRjBwvoUgBlvsQfgAOuuG/4wwWF5qbi006yEymjYRzWR1yru3TLa/+SonSLQ50E03d BCmd1HIYn8ASJMt/BkpZPKA5VmwnYVeqnpznH6LY+9VT9lnXh1b/kiYy6yYB30Ru4My+ 4/XIjTAMoIJlNENne424ydSb/fzSypkBIsqFcYFxn4Cq0BSNzUMa9iJ55BwuDXvawxob hL/uyCuW43rqAD+M+zeqcVb/Kr8zDCQr/jsYSi3oJsACn4YQGS3HTMr7rixRTDy4Npdc G+5Q== X-Gm-Message-State: AOAM530UPMyGdMmDJPulsnFUxBrv2O33/V96sZUr5TkIXrSOSAqUjIM1 666qWeqvvdeeWya+EWF2AMoO3Q== X-Google-Smtp-Source: ABdhPJxKuiBX6f4KJzlsVs8ZHQap9PIyUpv6lkcFvntcNCzI+ZMNm0uQ7S0hBLHBa7MWsXSsWTybIA== X-Received: by 2002:a17:90a:728d:: with SMTP id e13mr6596750pjg.51.1595358340852; Tue, 21 Jul 2020 12:05:40 -0700 (PDT) Received: from localhost (76-210-143-223.lightspeed.sntcca.sbcglobal.net. [76.210.143.223]) by smtp.gmail.com with ESMTPSA id q6sm21079467pfg.76.2020.07.21.12.05.40 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 21 Jul 2020 12:05:40 -0700 (PDT) Date: Tue, 21 Jul 2020 12:05:40 -0700 (PDT) X-Google-Original-Date: Tue, 21 Jul 2020 12:05:27 PDT (-0700) Subject: Re: [PATCH v5 1/4] riscv: Move kernel mapping to vmalloc zone In-Reply-To: <7cb2285e-68ba-6827-5e61-e33a4b65ac03@ghiti.fr> CC: mpe@ellerman.id.au, benh@kernel.crashing.org, paulus@samba.org, Paul Walmsley , aou@eecs.berkeley.edu, Anup Patel , Atish Patra , zong.li@sifive.com, linux-kernel@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, linux-riscv@lists.infradead.org, linux-mm@kvack.org From: Palmer Dabbelt To: alex@ghiti.fr Message-ID: Mime-Version: 1.0 (MHng) Content-Type: text/plain; charset=utf-8; format=flowed X-Rspamd-Queue-Id: 19C611813A8F3 X-Spamd-Result: default: False [0.00 / 100.00] X-Rspamd-Server: rspam03 Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Tue, 21 Jul 2020 11:36:10 PDT (-0700), alex@ghiti.fr wrote: > Let's try to make progress here: I add linux-mm in CC to get feedback o= n > this patch as it blocks sv48 support too. Sorry for being slow here. I haven't replied because I hadn't really fle= shed out the design yet, but just so everyone's on the same page my problems w= ith this are: * We waste vmalloc space on 32-bit systems, where there isn't a lot of it= . * On 64-bit systems the VA space around the kernel is precious because it= 's the only place we can place text (modules, BPF, whatever). If we start put= ting the kernel in the vmalloc space then we either have to pre-allocate a b= unch of space around it (essentially making it a fixed mapping anyway) or it becomes likely that we won't be able to find space for modules as they'= re loaded into running systems. * Relying on a relocatable kernel for sv48 support introduces a fairly la= rge performance hit. Roughly, my proposal would be to: * Leave the 32-bit memory map alone. On 32-bit systems we can load modul= es anywhere and we only have one VA width, so we're not really solving any problems with these changes. * Staticly allocate a 2GiB portion of the VA space for all our text, as i= ts own region. We'd link/relocate the kernel here instead of around PAGE_OFFS= ET, which would decouple the kernel from the physical memory layout of the = system. This would have the side effect of sorting out a bunch of bootloader he= adaches that we currently have. * Sort out how to maintain a linear map as the canonical hole moves aroun= d between the VA widths without adding a bunch of overhead to the virt2ph= ys and friends. This is probably going to be the trickiest part, but I think = if we just change the page table code to essentially lie about VAs when an sv= 39 system runs an sv48+sv39 kernel we could make it work -- there'd be som= e logical complexity involved, but it would remain fast. This doesn't solve the problem of virtually relocatable kernels, but it d= oes let us decouple that from the sv48 stuff. It also lets us stop relying o= n a fixed physical address the kernel is loaded into, which is another thing = I don't like. I know this may be a more complicated approach, but there aren't any sv48 systems around right now so I just don't see the rush to support them, particularly when there's a cost to what already exists (for those who ha= ven't been watching, so far all the sv48 patch sets have imposed a significant performance penalty on all systems). > > Alex > > Le 7/9/20 =C3=A0 7:11 AM, Alex Ghiti a =C3=A9crit=C2=A0: >> Hi Palmer, >> >> Le 7/9/20 =C3=A0 1:05 AM, Palmer Dabbelt a =C3=A9crit=C2=A0: >>> On Sun, 07 Jun 2020 00:59:46 PDT (-0700), alex@ghiti.fr wrote: >>>> This is a preparatory patch for relocatable kernel. >>>> >>>> The kernel used to be linked at PAGE_OFFSET address and used to be >>>> loaded >>>> physically at the beginning of the main memory. Therefore, we could = use >>>> the linear mapping for the kernel mapping. >>>> >>>> But the relocated kernel base address will be different from PAGE_OF= FSET >>>> and since in the linear mapping, two different virtual addresses can= not >>>> point to the same physical address, the kernel mapping needs to lie >>>> outside >>>> the linear mapping. >>> >>> I know it's been a while, but I keep opening this up to review it and >>> just >>> can't get over how ugly it is to put the kernel's linear map in the >>> vmalloc >>> region. >>> >>> I guess I don't understand why this is necessary at all. >>> Specifically: why >>> can't we just relocate the kernel within the linear map?=C2=A0 That w= ould >>> let the >>> bootloader put the kernel wherever it wants, modulo the physical >>> memory size we >>> support.=C2=A0 We'd need to handle the regions that are coupled to th= e >>> kernel's >>> execution address, but we could just put them in an explicit memory >>> region >>> which is what we should probably be doing anyway. >> >> Virtual relocation in the linear mapping requires to move the kernel >> physically too. Zong implemented this physical move in its KASLR RFC >> patchset, which is cumbersome since finding an available physical spot >> is harder than just selecting a virtual range in the vmalloc range. >> >> In addition, having the kernel mapping in the linear mapping prevents >> the use of hugepage for the linear mapping resulting in performance lo= ss >> (at least for the GB that encompasses the kernel). >> >> Why do you find this "ugly" ? The vmalloc region is just a bunch of >> available virtual addresses to whatever purpose we want, and as noted = by >> Zong, arm64 uses the same scheme. >> >>> >>>> In addition, because modules and BPF must be close to the kernel (in= side >>>> +-2GB window), the kernel is placed at the end of the vmalloc zone m= inus >>>> 2GB, which leaves room for modules and BPF. The kernel could not be >>>> placed at the beginning of the vmalloc zone since other vmalloc >>>> allocations from the kernel could get all the +-2GB window around th= e >>>> kernel which would prevent new modules and BPF programs to be loaded= . >>> >>> Well, that's not enough to make sure this doesn't happen -- it's just >>> enough to >>> make sure it doesn't happen very quickily.=C2=A0 That's the same boat= we're >>> already >>> in, though, so it's not like it's worse. >> >> Indeed, that's not worse, I haven't found a way to reserve vmalloc are= a >> without actually allocating it. >> >>> >>>> Signed-off-by: Alexandre Ghiti >>>> Reviewed-by: Zong Li >>>> --- >>>> =C2=A0arch/riscv/boot/loader.lds.S=C2=A0=C2=A0=C2=A0=C2=A0 |=C2=A0 3= +- >>>> =C2=A0arch/riscv/include/asm/page.h=C2=A0=C2=A0=C2=A0 | 10 +++++- >>>> =C2=A0arch/riscv/include/asm/pgtable.h | 38 ++++++++++++++------- >>>> =C2=A0arch/riscv/kernel/head.S=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0 |=C2=A0 3 +- >>>> =C2=A0arch/riscv/kernel/module.c=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= |=C2=A0 4 +-- >>>> =C2=A0arch/riscv/kernel/vmlinux.lds.S=C2=A0 |=C2=A0 3 +- >>>> =C2=A0arch/riscv/mm/init.c=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 | 58 +++++++++++++++++++++++++------- >>>> =C2=A0arch/riscv/mm/physaddr.c=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0 |=C2=A0 2 +- >>>> =C2=A08 files changed, 88 insertions(+), 33 deletions(-) >>>> >>>> diff --git a/arch/riscv/boot/loader.lds.S b/arch/riscv/boot/loader.l= ds.S >>>> index 47a5003c2e28..62d94696a19c 100644 >>>> --- a/arch/riscv/boot/loader.lds.S >>>> +++ b/arch/riscv/boot/loader.lds.S >>>> @@ -1,13 +1,14 @@ >>>> =C2=A0/* SPDX-License-Identifier: GPL-2.0 */ >>>> >>>> =C2=A0#include >>>> +#include >>>> >>>> =C2=A0OUTPUT_ARCH(riscv) >>>> =C2=A0ENTRY(_start) >>>> >>>> =C2=A0SECTIONS >>>> =C2=A0{ >>>> -=C2=A0=C2=A0=C2=A0 . =3D PAGE_OFFSET; >>>> +=C2=A0=C2=A0=C2=A0 . =3D KERNEL_LINK_ADDR; >>>> >>>> =C2=A0=C2=A0=C2=A0=C2=A0 .payload : { >>>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 *(.payload) >>>> diff --git a/arch/riscv/include/asm/page.h >>>> b/arch/riscv/include/asm/page.h >>>> index 2d50f76efe48..48bb09b6a9b7 100644 >>>> --- a/arch/riscv/include/asm/page.h >>>> +++ b/arch/riscv/include/asm/page.h >>>> @@ -90,18 +90,26 @@ typedef struct page *pgtable_t; >>>> >>>> =C2=A0#ifdef CONFIG_MMU >>>> =C2=A0extern unsigned long va_pa_offset; >>>> +extern unsigned long va_kernel_pa_offset; >>>> =C2=A0extern unsigned long pfn_base; >>>> =C2=A0#define ARCH_PFN_OFFSET=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0 (pfn_base) >>>> =C2=A0#else >>>> =C2=A0#define va_pa_offset=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= 0 >>>> +#define va_kernel_pa_offset=C2=A0=C2=A0=C2=A0 0 >>>> =C2=A0#define ARCH_PFN_OFFSET=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0 (PAGE_OFFSET >> PAGE_SHIFT) >>>> =C2=A0#endif /* CONFIG_MMU */ >>>> >>>> =C2=A0extern unsigned long max_low_pfn; >>>> =C2=A0extern unsigned long min_low_pfn; >>>> +extern unsigned long kernel_virt_addr; >>>> >>>> =C2=A0#define __pa_to_va_nodebug(x)=C2=A0=C2=A0=C2=A0 ((void *)((uns= igned long) (x) + >>>> va_pa_offset)) >>>> -#define __va_to_pa_nodebug(x)=C2=A0=C2=A0=C2=A0 ((unsigned long)(x)= - va_pa_offset) >>>> +#define linear_mapping_va_to_pa(x)=C2=A0=C2=A0=C2=A0 ((unsigned lon= g)(x) - >>>> va_pa_offset) >>>> +#define kernel_mapping_va_to_pa(x)=C2=A0=C2=A0=C2=A0 \ >>>> +=C2=A0=C2=A0=C2=A0 ((unsigned long)(x) - va_kernel_pa_offset) >>>> +#define __va_to_pa_nodebug(x)=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0 \ >>>> +=C2=A0=C2=A0=C2=A0 (((x) >=3D PAGE_OFFSET) ?=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0 \ >>>> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 linear_mapping_va_to_pa(= x) : kernel_mapping_va_to_pa(x)) >>>> >>>> =C2=A0#ifdef CONFIG_DEBUG_VIRTUAL >>>> =C2=A0extern phys_addr_t __virt_to_phys(unsigned long x); >>>> diff --git a/arch/riscv/include/asm/pgtable.h >>>> b/arch/riscv/include/asm/pgtable.h >>>> index 35b60035b6b0..94ef3b49dfb6 100644 >>>> --- a/arch/riscv/include/asm/pgtable.h >>>> +++ b/arch/riscv/include/asm/pgtable.h >>>> @@ -11,23 +11,29 @@ >>>> >>>> =C2=A0#include >>>> >>>> -#ifndef __ASSEMBLY__ >>>> - >>>> -/* Page Upper Directory not used in RISC-V */ >>>> -#include >>>> -#include >>>> -#include >>>> -#include >>>> - >>>> -#ifdef CONFIG_MMU >>>> +#ifndef CONFIG_MMU >>>> +#define KERNEL_VIRT_ADDR=C2=A0=C2=A0=C2=A0 PAGE_OFFSET >>>> +#define KERNEL_LINK_ADDR=C2=A0=C2=A0=C2=A0 PAGE_OFFSET >>>> +#else >>>> +/* >>>> + * Leave 2GB for modules and BPF that must lie within a 2GB range >>>> around >>>> + * the kernel. >>>> + */ >>>> +#define KERNEL_VIRT_ADDR=C2=A0=C2=A0=C2=A0 (VMALLOC_END - SZ_2G + 1= ) >>>> +#define KERNEL_LINK_ADDR=C2=A0=C2=A0=C2=A0 KERNEL_VIRT_ADDR >>> >>> At a bare minimum this is going to make a mess of the 32-bit port, as >>> non-relocatable kernels are now going to get linked at 1GiB which is >>> where user >>> code is supposed to live.=C2=A0 That's an easy fix, though, as the 32= -bit >>> stuff >>> doesn't need any module address restrictions. >> >> Indeed, I will take a look at that. >> >>> >>>> =C2=A0#define VMALLOC_SIZE=C2=A0=C2=A0=C2=A0=C2=A0 (KERN_VIRT_SIZE >= > 1) >>>> =C2=A0#define VMALLOC_END=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 (PAGE_OFFSET= - 1) >>>> =C2=A0#define VMALLOC_START=C2=A0=C2=A0=C2=A0 (PAGE_OFFSET - VMALLOC= _SIZE) >>>> >>>> =C2=A0#define BPF_JIT_REGION_SIZE=C2=A0=C2=A0=C2=A0 (SZ_128M) >>>> -#define BPF_JIT_REGION_START=C2=A0=C2=A0=C2=A0 (PAGE_OFFSET - BPF_J= IT_REGION_SIZE) >>>> -#define BPF_JIT_REGION_END=C2=A0=C2=A0=C2=A0 (VMALLOC_END) >>>> +#define BPF_JIT_REGION_START=C2=A0=C2=A0=C2=A0 PFN_ALIGN((unsigned = long)&_end) >>>> +#define BPF_JIT_REGION_END=C2=A0=C2=A0=C2=A0 (BPF_JIT_REGION_START = + >>>> BPF_JIT_REGION_SIZE) >>>> + >>>> +#ifdef CONFIG_64BIT >>>> +#define VMALLOC_MODULE_START=C2=A0=C2=A0=C2=A0 BPF_JIT_REGION_END >>>> +#define VMALLOC_MODULE_END=C2=A0=C2=A0=C2=A0 (((unsigned long)&_sta= rt & PAGE_MASK) >>>> + SZ_2G) >>>> +#endif >>>> >>>> =C2=A0/* >>>> =C2=A0 * Roughly size the vmemmap space to be large enough to fit en= ough >>>> @@ -57,9 +63,16 @@ >>>> =C2=A0#define FIXADDR_SIZE=C2=A0=C2=A0=C2=A0=C2=A0 PGDIR_SIZE >>>> =C2=A0#endif >>>> =C2=A0#define FIXADDR_START=C2=A0=C2=A0=C2=A0 (FIXADDR_TOP - FIXADDR= _SIZE) >>>> - >>>> =C2=A0#endif >>>> >>>> +#ifndef __ASSEMBLY__ >>>> + >>>> +/* Page Upper Directory not used in RISC-V */ >>>> +#include >>>> +#include >>>> +#include >>>> +#include >>>> + >>>> =C2=A0#ifdef CONFIG_64BIT >>>> =C2=A0#include >>>> =C2=A0#else >>>> @@ -483,6 +496,7 @@ static inline void __kernel_map_pages(struct pag= e >>>> *page, int numpages, int enabl >>>> >>>> =C2=A0#define kern_addr_valid(addr)=C2=A0=C2=A0 (1) /* FIXME */ >>>> >>>> +extern char _start[]; >>>> =C2=A0extern void *dtb_early_va; >>>> =C2=A0void setup_bootmem(void); >>>> =C2=A0void paging_init(void); >>>> diff --git a/arch/riscv/kernel/head.S b/arch/riscv/kernel/head.S >>>> index 98a406474e7d..8f5bb7731327 100644 >>>> --- a/arch/riscv/kernel/head.S >>>> +++ b/arch/riscv/kernel/head.S >>>> @@ -49,7 +49,8 @@ ENTRY(_start) >>>> =C2=A0#ifdef CONFIG_MMU >>>> =C2=A0relocate: >>>> =C2=A0=C2=A0=C2=A0=C2=A0 /* Relocate return address */ >>>> -=C2=A0=C2=A0=C2=A0 li a1, PAGE_OFFSET >>>> +=C2=A0=C2=A0=C2=A0 la a1, kernel_virt_addr >>>> +=C2=A0=C2=A0=C2=A0 REG_L a1, 0(a1) >>>> =C2=A0=C2=A0=C2=A0=C2=A0 la a2, _start >>>> =C2=A0=C2=A0=C2=A0=C2=A0 sub a1, a1, a2 >>>> =C2=A0=C2=A0=C2=A0=C2=A0 add ra, ra, a1 >>>> diff --git a/arch/riscv/kernel/module.c b/arch/riscv/kernel/module.c >>>> index 8bbe5dbe1341..1a8fbe05accf 100644 >>>> --- a/arch/riscv/kernel/module.c >>>> +++ b/arch/riscv/kernel/module.c >>>> @@ -392,12 +392,10 @@ int apply_relocate_add(Elf_Shdr *sechdrs, cons= t >>>> char *strtab, >>>> =C2=A0} >>>> >>>> =C2=A0#if defined(CONFIG_MMU) && defined(CONFIG_64BIT) >>>> -#define VMALLOC_MODULE_START \ >>>> -=C2=A0=C2=A0=C2=A0=C2=A0 max(PFN_ALIGN((unsigned long)&_end - SZ_2G= ), VMALLOC_START) >>>> =C2=A0void *module_alloc(unsigned long size) >>>> =C2=A0{ >>>> =C2=A0=C2=A0=C2=A0=C2=A0 return __vmalloc_node_range(size, 1, VMALLO= C_MODULE_START, >>>> -=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 VMALLOC_END, GFP_KERNEL, >>>> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 VMALLOC_MODULE_END, GFP_KER= NEL, >>>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 PAGE_KERNEL_EXEC, 0, = NUMA_NO_NODE, >>>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 __builtin_return_addr= ess(0)); >>>> =C2=A0} >>>> diff --git a/arch/riscv/kernel/vmlinux.lds.S >>>> b/arch/riscv/kernel/vmlinux.lds.S >>>> index 0339b6bbe11a..a9abde62909f 100644 >>>> --- a/arch/riscv/kernel/vmlinux.lds.S >>>> +++ b/arch/riscv/kernel/vmlinux.lds.S >>>> @@ -4,7 +4,8 @@ >>>> =C2=A0 * Copyright (C) 2017 SiFive >>>> =C2=A0 */ >>>> >>>> -#define LOAD_OFFSET PAGE_OFFSET >>>> +#include >>>> +#define LOAD_OFFSET KERNEL_LINK_ADDR >>>> =C2=A0#include >>>> =C2=A0#include >>>> =C2=A0#include >>>> diff --git a/arch/riscv/mm/init.c b/arch/riscv/mm/init.c >>>> index 736de6c8739f..71da78914645 100644 >>>> --- a/arch/riscv/mm/init.c >>>> +++ b/arch/riscv/mm/init.c >>>> @@ -22,6 +22,9 @@ >>>> >>>> =C2=A0#include "../kernel/head.h" >>>> >>>> +unsigned long kernel_virt_addr =3D KERNEL_VIRT_ADDR; >>>> +EXPORT_SYMBOL(kernel_virt_addr); >>>> + >>>> =C2=A0unsigned long empty_zero_page[PAGE_SIZE / sizeof(unsigned long= )] >>>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0 __page_aligned_bss; >>>> =C2=A0EXPORT_SYMBOL(empty_zero_page); >>>> @@ -178,8 +181,12 @@ void __init setup_bootmem(void) >>>> =C2=A0} >>>> >>>> =C2=A0#ifdef CONFIG_MMU >>>> +/* Offset between linear mapping virtual address and kernel load >>>> address */ >>>> =C2=A0unsigned long va_pa_offset; >>>> =C2=A0EXPORT_SYMBOL(va_pa_offset); >>>> +/* Offset between kernel mapping virtual address and kernel load >>>> address */ >>>> +unsigned long va_kernel_pa_offset; >>>> +EXPORT_SYMBOL(va_kernel_pa_offset); >>>> =C2=A0unsigned long pfn_base; >>>> =C2=A0EXPORT_SYMBOL(pfn_base); >>>> >>>> @@ -271,7 +278,7 @@ static phys_addr_t __init alloc_pmd(uintptr_t va= ) >>>> =C2=A0=C2=A0=C2=A0=C2=A0 if (mmu_enabled) >>>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 return memblock_phy= s_alloc(PAGE_SIZE, PAGE_SIZE); >>>> >>>> -=C2=A0=C2=A0=C2=A0 pmd_num =3D (va - PAGE_OFFSET) >> PGDIR_SHIFT; >>>> +=C2=A0=C2=A0=C2=A0 pmd_num =3D (va - kernel_virt_addr) >> PGDIR_SHI= FT; >>>> =C2=A0=C2=A0=C2=A0=C2=A0 BUG_ON(pmd_num >=3D NUM_EARLY_PMDS); >>>> =C2=A0=C2=A0=C2=A0=C2=A0 return (uintptr_t)&early_pmd[pmd_num * PTRS= _PER_PMD]; >>>> =C2=A0} >>>> @@ -372,14 +379,30 @@ static uintptr_t __init >>>> best_map_size(phys_addr_t base, phys_addr_t size) >>>> =C2=A0#error "setup_vm() is called from head.S before relocate so it >>>> should not use absolute addressing." >>>> =C2=A0#endif >>>> >>>> +static uintptr_t load_pa, load_sz; >>>> + >>>> +static void __init create_kernel_page_table(pgd_t *pgdir, uintptr_t >>>> map_size) >>>> +{ >>>> +=C2=A0=C2=A0=C2=A0 uintptr_t va, end_va; >>>> + >>>> +=C2=A0=C2=A0=C2=A0 end_va =3D kernel_virt_addr + load_sz; >>>> +=C2=A0=C2=A0=C2=A0 for (va =3D kernel_virt_addr; va < end_va; va +=3D= map_size) >>>> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 create_pgd_mapping(pgdir= , va, >>>> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 load_pa + (va - kernel_virt_addr)= , >>>> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 map_size, PAGE_KERNEL_EXEC); >>>> +} >>>> + >>>> =C2=A0asmlinkage void __init setup_vm(uintptr_t dtb_pa) >>>> =C2=A0{ >>>> =C2=A0=C2=A0=C2=A0=C2=A0 uintptr_t va, end_va; >>>> -=C2=A0=C2=A0=C2=A0 uintptr_t load_pa =3D (uintptr_t)(&_start); >>>> -=C2=A0=C2=A0=C2=A0 uintptr_t load_sz =3D (uintptr_t)(&_end) - load_= pa; >>>> =C2=A0=C2=A0=C2=A0=C2=A0 uintptr_t map_size =3D best_map_size(load_p= a, >>>> MAX_EARLY_MAPPING_SIZE); >>>> >>>> +=C2=A0=C2=A0=C2=A0 load_pa =3D (uintptr_t)(&_start); >>>> +=C2=A0=C2=A0=C2=A0 load_sz =3D (uintptr_t)(&_end) - load_pa; >>>> + >>>> =C2=A0=C2=A0=C2=A0=C2=A0 va_pa_offset =3D PAGE_OFFSET - load_pa; >>>> +=C2=A0=C2=A0=C2=A0 va_kernel_pa_offset =3D kernel_virt_addr - load_= pa; >>>> + >>>> =C2=A0=C2=A0=C2=A0=C2=A0 pfn_base =3D PFN_DOWN(load_pa); >>>> >>>> =C2=A0=C2=A0=C2=A0=C2=A0 /* >>>> @@ -402,26 +425,22 @@ asmlinkage void __init setup_vm(uintptr_t dtb_= pa) >>>> =C2=A0=C2=A0=C2=A0=C2=A0 create_pmd_mapping(fixmap_pmd, FIXADDR_STAR= T, >>>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0 (uintptr_t)fixmap_pte, PMD_SIZE, PAGE_TABLE); >>>> =C2=A0=C2=A0=C2=A0=C2=A0 /* Setup trampoline PGD and PMD */ >>>> -=C2=A0=C2=A0=C2=A0 create_pgd_mapping(trampoline_pg_dir, PAGE_OFFSE= T, >>>> +=C2=A0=C2=A0=C2=A0 create_pgd_mapping(trampoline_pg_dir, kernel_vir= t_addr, >>>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0 (uintptr_t)trampoline_pmd, PGDIR_SIZE, PAGE_TABLE); >>>> -=C2=A0=C2=A0=C2=A0 create_pmd_mapping(trampoline_pmd, PAGE_OFFSET, >>>> +=C2=A0=C2=A0=C2=A0 create_pmd_mapping(trampoline_pmd, kernel_virt_a= ddr, >>>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0 load_pa, PMD_SIZE, PAGE_KERNEL_EXEC); >>>> =C2=A0#else >>>> =C2=A0=C2=A0=C2=A0=C2=A0 /* Setup trampoline PGD */ >>>> -=C2=A0=C2=A0=C2=A0 create_pgd_mapping(trampoline_pg_dir, PAGE_OFFSE= T, >>>> +=C2=A0=C2=A0=C2=A0 create_pgd_mapping(trampoline_pg_dir, kernel_vir= t_addr, >>>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0 load_pa, PGDIR_SIZE, PAGE_KERNEL_EXEC); >>>> =C2=A0#endif >>>> >>>> =C2=A0=C2=A0=C2=A0=C2=A0 /* >>>> -=C2=A0=C2=A0=C2=A0=C2=A0 * Setup early PGD covering entire kernel w= hich will allows >>>> +=C2=A0=C2=A0=C2=A0=C2=A0 * Setup early PGD covering entire kernel w= hich will allow >>>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 * us to reach paging_init(). We map a= ll memory banks later >>>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 * in setup_vm_final() below. >>>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 */ >>>> -=C2=A0=C2=A0=C2=A0 end_va =3D PAGE_OFFSET + load_sz; >>>> -=C2=A0=C2=A0=C2=A0 for (va =3D PAGE_OFFSET; va < end_va; va +=3D ma= p_size) >>>> -=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 create_pgd_mapping(early= _pg_dir, va, >>>> -=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 load_pa + (va - PAGE_OFFSET), >>>> -=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 map_size, PAGE_KERNEL_EXEC); >>>> +=C2=A0=C2=A0=C2=A0 create_kernel_page_table(early_pg_dir, map_size)= ; >>>> >>>> =C2=A0=C2=A0=C2=A0=C2=A0 /* Create fixed mapping for early FDT parsi= ng */ >>>> =C2=A0=C2=A0=C2=A0=C2=A0 end_va =3D __fix_to_virt(FIX_FDT) + FIX_FDT= _SIZE; >>>> @@ -441,6 +460,7 @@ static void __init setup_vm_final(void) >>>> =C2=A0=C2=A0=C2=A0=C2=A0 uintptr_t va, map_size; >>>> =C2=A0=C2=A0=C2=A0=C2=A0 phys_addr_t pa, start, end; >>>> =C2=A0=C2=A0=C2=A0=C2=A0 struct memblock_region *reg; >>>> +=C2=A0=C2=A0=C2=A0 static struct vm_struct vm_kernel =3D { 0 }; >>>> >>>> =C2=A0=C2=A0=C2=A0=C2=A0 /* Set mmu_enabled flag */ >>>> =C2=A0=C2=A0=C2=A0=C2=A0 mmu_enabled =3D true; >>>> @@ -467,10 +487,22 @@ static void __init setup_vm_final(void) >>>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 for (pa =3D start; = pa < end; pa +=3D map_size) { >>>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0 va =3D (uintptr_t)__va(pa); >>>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0 create_pgd_mapping(swapper_pg_dir, va, pa, >>>> -=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 map_size,= PAGE_KERNEL_EXEC); >>>> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 map_size,= PAGE_KERNEL); >>>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 } >>>> =C2=A0=C2=A0=C2=A0=C2=A0 } >>>> >>>> +=C2=A0=C2=A0=C2=A0 /* Map the kernel */ >>>> +=C2=A0=C2=A0=C2=A0 create_kernel_page_table(swapper_pg_dir, PMD_SIZ= E); >>>> + >>>> +=C2=A0=C2=A0=C2=A0 /* Reserve the vmalloc area occupied by the kern= el */ >>>> +=C2=A0=C2=A0=C2=A0 vm_kernel.addr =3D (void *)kernel_virt_addr; >>>> +=C2=A0=C2=A0=C2=A0 vm_kernel.phys_addr =3D load_pa; >>>> +=C2=A0=C2=A0=C2=A0 vm_kernel.size =3D (load_sz + PMD_SIZE - 1) & ~(= PMD_SIZE - 1); >>>> +=C2=A0=C2=A0=C2=A0 vm_kernel.flags =3D VM_MAP | VM_NO_GUARD; >>>> +=C2=A0=C2=A0=C2=A0 vm_kernel.caller =3D __builtin_return_address(0)= ; >>>> + >>>> +=C2=A0=C2=A0=C2=A0 vm_area_add_early(&vm_kernel); >>>> + >>>> =C2=A0=C2=A0=C2=A0=C2=A0 /* Clear fixmap PTE and PMD mappings */ >>>> =C2=A0=C2=A0=C2=A0=C2=A0 clear_fixmap(FIX_PTE); >>>> =C2=A0=C2=A0=C2=A0=C2=A0 clear_fixmap(FIX_PMD); >>>> diff --git a/arch/riscv/mm/physaddr.c b/arch/riscv/mm/physaddr.c >>>> index e8e4dcd39fed..35703d5ef5fd 100644 >>>> --- a/arch/riscv/mm/physaddr.c >>>> +++ b/arch/riscv/mm/physaddr.c >>>> @@ -23,7 +23,7 @@ EXPORT_SYMBOL(__virt_to_phys); >>>> >>>> =C2=A0phys_addr_t __phys_addr_symbol(unsigned long x) >>>> =C2=A0{ >>>> -=C2=A0=C2=A0=C2=A0 unsigned long kernel_start =3D (unsigned long)PA= GE_OFFSET; >>>> +=C2=A0=C2=A0=C2=A0 unsigned long kernel_start =3D (unsigned long)ke= rnel_virt_addr; >>>> =C2=A0=C2=A0=C2=A0=C2=A0 unsigned long kernel_end =3D (unsigned long= )_end; >>>> >>>> =C2=A0=C2=A0=C2=A0=C2=A0 /* >> >> Alex