From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.8 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE, SPF_PASS,URIBL_BLOCKED autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id A9759C433E0 for ; Tue, 21 Jul 2020 23:36:30 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 67D5B20720 for ; Tue, 21 Jul 2020 23:36:30 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=dabbelt-com.20150623.gappssmtp.com header.i=@dabbelt-com.20150623.gappssmtp.com header.b="p+44Z+4/" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 67D5B20720 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=dabbelt.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id E5C506B0003; Tue, 21 Jul 2020 19:36:29 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id E0CE66B0005; Tue, 21 Jul 2020 19:36:29 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D227F6B0006; Tue, 21 Jul 2020 19:36:29 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0146.hostedemail.com [216.40.44.146]) by kanga.kvack.org (Postfix) with ESMTP id BE0806B0003 for ; Tue, 21 Jul 2020 19:36:29 -0400 (EDT) Received: from smtpin10.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 334F28248047 for ; Tue, 21 Jul 2020 23:36:29 +0000 (UTC) X-FDA: 77063694498.10.owl49_5817a4c26f31 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin10.hostedemail.com (Postfix) with ESMTP id 03ED9104CA5D8 for ; Tue, 21 Jul 2020 23:36:28 +0000 (UTC) X-HE-Tag: owl49_5817a4c26f31 X-Filterd-Recvd-Size: 8459 Received: from mail-pf1-f193.google.com (mail-pf1-f193.google.com [209.85.210.193]) by imf22.hostedemail.com (Postfix) with ESMTP for ; Tue, 21 Jul 2020 23:36:28 +0000 (UTC) Received: by mail-pf1-f193.google.com with SMTP id a23so193855pfk.13 for ; Tue, 21 Jul 2020 16:36:28 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=dabbelt-com.20150623.gappssmtp.com; s=20150623; h=date:subject:in-reply-to:cc:from:to:message-id:mime-version :content-transfer-encoding; bh=4/y9Kml4kEgPy1nuHtQbnnn9hcon0j83gAhPQ/a7NVI=; b=p+44Z+4/ckHHF+GAQ4A2skbeyMPVgmukHh3fOKGwTQemnwSJ6GxWjmD7CZ9Tw+uoJe GrUYmrFsqB+Hxkn7XtjCnb3UIrgRDrYn351sxhMdvuu4nXXNBn+qXFhFVl67IbZhW/Uu pRFAxMYa06+em/bBZD0yxobDqZsaNeeyw2g4ArOeEsHrhQN0KddMoF7MB2BKEIGtTJcy Jn+T+wIhbtgRqr8e1vInERFXLi6qwBfiacu3vAtXQ5F2bwo26NHVG0utrKaT49FR+mls iSrY01VlGtHRpaIjgBftwYPk40xsrwfXm20K9WwRb/Y4HrOgsQjLMe7fxj5f3vvq/s3T WLag== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:subject:in-reply-to:cc:from:to:message-id :mime-version:content-transfer-encoding; bh=4/y9Kml4kEgPy1nuHtQbnnn9hcon0j83gAhPQ/a7NVI=; b=lBtgonsSAQvaMGlWoSFwbXCREjY3dUoSq47NPj6EALCaH9rKB0ztj6wQ20JNHgeU2u 0JpjLyE4bgUYIYh9nWcJtEQJQN4UH9/BD0A2Fy2VoM40FRiJdXvKvICtcQ9y4DLhcwNG 56cxZyttdHhWnFD68BTirTcxfX6GcAfrTkzlAvs8Yu32V+9oOJirfY65Fh63J/bFbXda RLuryXAUlWHc+OYvWzIzSE3u/y5h4clha+sx9hkQ0F9zkmH0cCmiCgIKVuT2agxOAUcJ Xts6k8Eu/2QxuO4hBYPqp8qFm9NDFQhwjaypsONGrEDDqpjdxBsc/XHYPAA4+cvO2yOL sxXw== X-Gm-Message-State: AOAM532YXOX2eniAsN7w2ZJbMFiYJcRg/mZFi9VEjkLNQGgayIOaqIT/ tslb/z7oaFBrMwJ27mtSG7gYpQ== X-Google-Smtp-Source: ABdhPJzB67n3W2mkqzOSwKaM1AJtP/MeBDuysfieMDV8LWATOje0NdcZ9GK0r2Z3+Gk+qzhNM2jtKw== X-Received: by 2002:a62:8782:: with SMTP id i124mr26134197pfe.267.1595374587183; Tue, 21 Jul 2020 16:36:27 -0700 (PDT) Received: from localhost (76-210-143-223.lightspeed.sntcca.sbcglobal.net. [76.210.143.223]) by smtp.gmail.com with ESMTPSA id d24sm4115944pjx.36.2020.07.21.16.36.26 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 21 Jul 2020 16:36:26 -0700 (PDT) Date: Tue, 21 Jul 2020 16:36:26 -0700 (PDT) X-Google-Original-Date: Tue, 21 Jul 2020 16:36:24 PDT (-0700) Subject: Re: [PATCH v5 1/4] riscv: Move kernel mapping to vmalloc zone In-Reply-To: <54af168083aee9dbda1b531227521a26b77ba2c8.camel@kernel.crashing.org> CC: alex@ghiti.fr, mpe@ellerman.id.au, paulus@samba.org, Paul Walmsley , aou@eecs.berkeley.edu, Anup Patel , Atish Patra , zong.li@sifive.com, linux-kernel@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, linux-riscv@lists.infradead.org, linux-mm@kvack.org From: Palmer Dabbelt To: benh@kernel.crashing.org Message-ID: Mime-Version: 1.0 (MHng) Content-Type: text/plain; charset=utf-8; format=flowed X-Rspamd-Queue-Id: 03ED9104CA5D8 X-Spamd-Result: default: False [0.00 / 100.00] X-Rspamd-Server: rspam04 Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000006, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Tue, 21 Jul 2020 16:11:02 PDT (-0700), benh@kernel.crashing.org wrote: > On Tue, 2020-07-21 at 14:36 -0400, Alex Ghiti wrote: >> > > I guess I don't understand why this is necessary at all. >> > > Specifically: why >> > > can't we just relocate the kernel within the linear map? That wou= ld >> > > let the >> > > bootloader put the kernel wherever it wants, modulo the physical >> > > memory size we >> > > support. We'd need to handle the regions that are coupled to the >> > > kernel's >> > > execution address, but we could just put them in an explicit memor= y >> > > region >> > > which is what we should probably be doing anyway. >> > >> > Virtual relocation in the linear mapping requires to move the kernel >> > physically too. Zong implemented this physical move in its KASLR RFC >> > patchset, which is cumbersome since finding an available physical sp= ot >> > is harder than just selecting a virtual range in the vmalloc range. >> > >> > In addition, having the kernel mapping in the linear mapping prevent= s >> > the use of hugepage for the linear mapping resulting in performance = loss >> > (at least for the GB that encompasses the kernel). >> > >> > Why do you find this "ugly" ? The vmalloc region is just a bunch of >> > available virtual addresses to whatever purpose we want, and as note= d by >> > Zong, arm64 uses the same scheme. > > I don't get it :-) > > At least on powerpc we move the kernel in the linear mapping and it > works fine with huge pages, what is your problem there ? You rely on > punching small-page size holes in there ? That was my original suggestion, and I'm not actually sure it's invalid. = It would mean that both the kernel's physical and virtual addresses are set = by the bootloader, which may or may not be workable if we want to have an sv48+s= v39 kernel. My initial approach to sv48+sv39 kernels would be to just throw = away the sv39 memory on sv48 kernels, which would preserve the linear map but = mean that there is no single physical address that's accessible for both. Tha= t would require some coordination between the bootloader and the kernel as = to where it should be loaded, but maybe there's a better way to design the l= inear map. Right now we have a bunch of unwritten rules about where things nee= d to be loaded, which is a recipe for disaster. We could copy the kernel around, but I'm not sure I really like that idea= . We do zero the BSS right now, so it's not like we entirely rely on the bootl= oader to set up the kernel image, but with the hart race boot scheme we have ri= ght now we'd at least need to leave a stub sitting around. Maybe we just thr= ow away SBI v0.1, though, that's why we called it all legacy in the first pl= ace. My bigger worry is that anything that involves running the kernel at arbi= trary virtual addresses means we need a PIC kernel, which means every global sy= mbol needs an indirection. That's probably not so bad for shared libraries, b= ut the kernel has a lot of global symbols. PLT references probably aren't so sc= ary, as we have an incoherent instruction cache so the virtual function predic= tor isn't that hard to build, but making all global data accesses GOT-relativ= e seems like a disaster for performance. This fixed-VA thing really just e= xists so we don't have to be full-on PIC. In theory I think we could just get away with pretending that medany is P= IC, which I believe works as long as the data and text offset stays constant,= you you don't have any symbols between 2GiB and -2GiB (as those may stay fixe= d, even in medany), and you deal with GP accordingly (which should work itse= lf out in the current startup code). We rely on this for some of the early boot= code (and will soon for kexec), but that's a very controlled code base and we'= ve already had some issues. I'd be much more comfortable adding an explicit semi-PIC code model, as I tend to miss something when doing these sorts o= f things and then we could at least add it to the GCC test runs and guarant= ee it actually works. Not really sure I want to deal with that, though. It wo= uld, however, be the only way to get random virtual addresses during kernel execution. > At least in the old days, there were a number of assumptions that > the kernel text/data/bss resides in the linear mapping. Ya, it terrified me as well. Alex says arm64 puts the kernel in the vmal= loc region, so assuming that's the case it must be possible. I didn't get th= at from reading the arm64 port (I guess it's no secret that pretty much all = I do is copy their code) > If you change that you need to ensure that it's still physically > contiguous and you'll have to tweak __va and __pa, which might induce > extra overhead. I'm operating under the assumption that we don't want to add an additiona= l load to virt2phys conversions. arm64 bends over backwards to avoid the load, = and I'm assuming they have a reason for doing so. Of course, if we're PIC th= en maybe performance just doesn't matter, but I'm not sure I want to just gi= ve up. Distros will probably build the sv48+sv39 kernels as soon as they show up= , even if there's no sv48 hardware for a while.