From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2CADFC77B60 for ; Wed, 26 Apr 2023 08:02:55 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 6E2996B0081; Wed, 26 Apr 2023 04:02:54 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 692C06B0087; Wed, 26 Apr 2023 04:02:54 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 55AB66B008A; Wed, 26 Apr 2023 04:02:54 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 481266B0081 for ; Wed, 26 Apr 2023 04:02:54 -0400 (EDT) Received: from smtpin27.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id DF4001601DF for ; Wed, 26 Apr 2023 08:02:53 +0000 (UTC) X-FDA: 80722801026.27.112938C Received: from mail-lj1-f174.google.com (mail-lj1-f174.google.com [209.85.208.174]) by imf20.hostedemail.com (Postfix) with ESMTP id D958D1C0017 for ; Wed, 26 Apr 2023 08:02:51 +0000 (UTC) Authentication-Results: imf20.hostedemail.com; dkim=pass header.d=rivosinc-com.20221208.gappssmtp.com header.s=20221208 header.b=Hp3aE02H; dmarc=none; spf=pass (imf20.hostedemail.com: domain of atishp@rivosinc.com designates 209.85.208.174 as permitted sender) smtp.mailfrom=atishp@rivosinc.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1682496172; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=c6VqCfb1kZ/+ni6WW1VeBoVtLPJZGHHpNASSaglTaKY=; b=wKQxYvdfzndhxkiP8389KOC5Va0mYpsnI3j8cdOeWDlaIS+N7KCimkk56NtICxkBVCVp3m cZjQDxK/hG23UI3GPBYDA1wPLCIcz/sOeYhM/mhtfyP9NoiO3SgEL7KH5/RFkHqMcX96Sx AymAU+AyXUrJDiNCVy7Uj5U1CZ13uHY= ARC-Authentication-Results: i=1; imf20.hostedemail.com; dkim=pass header.d=rivosinc-com.20221208.gappssmtp.com header.s=20221208 header.b=Hp3aE02H; dmarc=none; spf=pass (imf20.hostedemail.com: domain of atishp@rivosinc.com designates 209.85.208.174 as permitted sender) smtp.mailfrom=atishp@rivosinc.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1682496172; a=rsa-sha256; cv=none; b=crCxvTl/1dgwsk+SmEzJa306dXES8Zj3GNFpRT/PAKAehZCdYChY02XWFcP4RCXII48IG4 3vk0kWtjId8I6k1xcACY0H+9vD4N6gMpDqRd3goDqNC5LG96bpnLZl/K+5Fb9JF9EKVU2e AYBWaQPE/WmL6WGgAv7RV+NyjtEGUCw= Received: by mail-lj1-f174.google.com with SMTP id 38308e7fff4ca-2a8dc00ade2so66040551fa.0 for ; Wed, 26 Apr 2023 01:02:51 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=rivosinc-com.20221208.gappssmtp.com; s=20221208; t=1682496170; x=1685088170; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=c6VqCfb1kZ/+ni6WW1VeBoVtLPJZGHHpNASSaglTaKY=; b=Hp3aE02HRO0m3EBKxD02RPwATf5ceTDh/i9USme207oxn1diekd5bTeXtZQSKuJrxc lqxuBO0dtGbO01nQ1KD4vl/RkVBN0QnvLtUu7Y6aZDFdB7OiGGLCXEmGM6h6YHIWwtBU HxyaJiWfbNR6Zuemq5kBjaycfIlDoYxHVaVH4FxsN9o0Est2c0n5wikAMnmtFiXX9PgM X9mbBgGkR1aQGsADNvNuE/7vP4qPCdq71Cq7Eyt/io68aRAjbtbK6b4fMCrAjZwWkK3/ k44hCbUubgZlGOX/Wn/x/Ris46x7S735iq4yyoAI4E97YMPF+plb+C8uS1LYmylluBR2 UcIA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1682496170; x=1685088170; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=c6VqCfb1kZ/+ni6WW1VeBoVtLPJZGHHpNASSaglTaKY=; b=Z97BeyqkobgvjF/9q3bFsnU9FiEYJ1VtR1NxrvTmSZ/qf2798HrntPdKThvOtcC7iK me4Ef1kaKfA4poJJAsOVrmzPAMEYDBqqfqunq8fGeIQS7uYo8mgwcP9pgZ6cVpMxoWo5 zYosHtXYP6XhzWqjIeg6c6nxeojaYh0yWdbXOLuDeQH+yeKNOtMnWMxB/NtWWcG69+Vk Jb61d4KNgwqpSQVjJOudNRfQgiRzy5gxSwuGWesj7Ae9pHRgTFa/AGmVSQ3zLMmX5yxg 2wyJV2VyLxtopKCMZgDkpDpMex/irZGzz+2Wso85i+p0SXc8EzTKUy2DHDATIvezqSkt YwpA== X-Gm-Message-State: AAQBX9cet4Pj0p8zeC4ACP0ksYYGUUxJ9cUWgTu2adk/4pim2A6162PN 6Sm9gk0SYpkLQ5n2iau2GTtFNQT0B91zS5tjvIOlPg== X-Google-Smtp-Source: AKy350bhUHsJEvcsW8iSEylieDylog00SNfNj0iBGM6TzFGJ4p1/0j/5I8WnT45BTziCTTQzCFmf2MYWcO8/ALW7viU= X-Received: by 2002:a2e:a41a:0:b0:2a1:ab4a:153d with SMTP id p26-20020a2ea41a000000b002a1ab4a153dmr4970025ljn.29.1682496169872; Wed, 26 Apr 2023 01:02:49 -0700 (PDT) MIME-Version: 1.0 References: <20230419221716.3603068-1-atishp@rivosinc.com> <20230419221716.3603068-46-atishp@rivosinc.com> <69ba1760-a079-fd8f-b079-fcb01e3eedec@intel.com> <81c476f4-ef62-e4a6-0033-8a46a15379fd@intel.com> In-Reply-To: From: Atish Kumar Patra Date: Wed, 26 Apr 2023 13:32:38 +0530 Message-ID: Subject: Re: [RFC 45/48] RISC-V: ioremap: Implement for arch specific ioremap hooks To: Dave Hansen Cc: linux-kernel@vger.kernel.org, Rajnesh Kanwal , Alexandre Ghiti , Andrew Jones , Andrew Morton , Anup Patel , Atish Patra , =?UTF-8?B?QmrDtnJuIFTDtnBlbA==?= , Suzuki K Poulose , Will Deacon , Marc Zyngier , Sean Christopherson , linux-coco@lists.linux.dev, Dylan Reid , abrestic@rivosinc.com, Samuel Ortiz , Christoph Hellwig , Conor Dooley , Greg Kroah-Hartman , Guo Ren , Heiko Stuebner , Jiri Slaby , kvm-riscv@lists.infradead.org, kvm@vger.kernel.org, linux-mm@kvack.org, linux-riscv@lists.infradead.org, Mayuresh Chitale , Palmer Dabbelt , Paolo Bonzini , Paul Walmsley , Uladzislau Rezki Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspam-User: X-Rspamd-Server: rspam02 X-Rspamd-Queue-Id: D958D1C0017 X-Stat-Signature: ynfrgk8iw9kpoe33r15ae6kw7rabzq6r X-HE-Tag: 1682496171-697706 X-HE-Meta: U2FsdGVkX19fi7x3yC2eSiy5aJMqbny1ZtZueCn5AKyfaRnB6/B0Dp0r8mnazkKiecuVX1+9X4EEpDIVwDomj+T+DVXQV4L0SlPS488dMglU1J1SCp+UKG5TmieXF/yjmjvBM1jD04NJ5O4WL/DcThoxWdsi37aAjaVc0uBuB754TsaYE72fep81dhXodiWV0gJpbRjk58FQGSexVAZBTqNVV5c5lIw/18hbn8sDZpCrFOXGYqPgRnoCAZ7f7oo3jz1f4zfd0qudELuXOfcG9TQG8RRoJOr/lXUd2TUc9Mxa5zKLULBrBqY5nge65/30xonTJpvDmyVJOUHTG0QdrFF7OP1F8xP+S+RwWAWm1LWFkcMlvnhtsT58o3LfoXrrQDs+2nC7rMhki2trJIC6U9w43o4FwzDFRUYu+y6Pv3c7k2vOyIBS4jEgY+wcNS5zhZ/qdjWQSVvvz1/PSMstfmZRNIg8mAnKC0IdTS1J5EhFznQH5x+SLV+pAKqlXD2ChKWBz0B7xXelqvjuiwaJfWAhMz2/isnXftAjx21URV4Zkp4+slMGUd3Y5fwIv8G8jWdVNS507HCxu8kCzCm10zbEN/ZQo9w+cBLn7juzbESxSEO52sAzuKYSTDlNDpGr1jBkcv+1dEQnLCZtdvg3WSDFk814NZhEtDjfCtOZCK9pToxG8Qem64RZTTok6ugVXBX8oAdARx2r4X4a3zu3hfm1F5KEVSw1XjizcwWQqG4RtUolzKTbiFw2Hlro/iRv6b8VbuVa/PIwiOTDjgNtHI0tRAnymgkZU0OpJHLWUChkWNcYqcHpmppO9Crmd+wDuLjFjtfHZnLS6Rkse9Rjy1AFhm7iYwwJg37/JFl6cc3Gnhh9fmF6q2rv4FVtwmBmqTSfWKkJpGzbzLxkN8vNbYKCz/9ShJYOanSS0QF7yqxSxjlngjew5BGdVHglfrhLgDCGGxlS7kGu7d60Ntz fRbWe3W+ UpjiLTeB0NRa8nGI8/24+Ua6zgeh0wHxDCWj+2+jURO9ZSrgcjrKp6Rynlm1wjDe/+IUtehZ97Vgtd2K1NtoQEdWUSX9O9/Pij0xaIG/+wsXy5UhXkU5mJV9A0U9mY9+VUJ97snfMCCYrRCguk5gQSpED4pDkqiKHEorEMdKeK4rLWmhkCIZ6N9iYdT3Ym9CWXSX7y62UZ34t8uF+19jPtPS8OlH0OHDubuWIYZ9GSGjqY62O9NHffzF5MdMWlaf10iB5hMYUjThPG7RTRu5k9ekU+OEYr49/HHQSjZ4eFgBHQO6m6HWMUELrZLPvXJGxS3ey4Vq+k8UaWeljSeJhyzquQZMObHiXhVilXJ2RAkYfGm7PWJ8+FLVz56nzOVl1GlTd X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Tue, Apr 25, 2023 at 6:40=E2=80=AFPM Dave Hansen = wrote: > > On 4/25/23 01:00, Atish Kumar Patra wrote: > > On Mon, Apr 24, 2023 at 7:18=E2=80=AFPM Dave Hansen wrote: > >> On 4/21/23 12:24, Atish Kumar Patra wrote: > >> I'm not _quite_ sure what "guest initiated" means. But SEV and TDX > >> don't require an ioremap hook like this. So, even if they *are* "gues= t > >> initiated", the question still remains how they work without this patc= h, > >> or what they are missing without it. > > > > Maybe I misunderstood your question earlier. Are you concerned about gu= ests > > invoking any MMIO region specific calls in the ioremap path or passing > > that information to the host ? > > My concern is that I don't know why this patch is here. There should be > a very simple answer to the question: Why does RISC-V need this patch > but x86 does not? > > > Earlier, I assumed the former but it seems you are also concerned > > about the latter as well. Sorry for the confusion in that case. > > The guest initiation is necessary while the host notification can be > > made optional. > > The "guest initiated" means the guest tells the TSM (equivalent of TDX > > module in RISC-V) the MMIO region details. > > The TSM keeps a track of this and any page faults that happen in that > > region are forwarded > > to the host by the TSM after the instruction decoding. Thus TSM can > > make sure that only ioremapped regions are > > considered MMIO regions. Otherwise, all memory outside the guest > > physical region will be considered as the MMIO region. > > Ahh, OK, that's a familiar problem. I see the connection to device > filtering now. > > Is this functionality in the current set? I went looking for it and all > I found was the host notification side. > The current series doesn't have the guest filtering feature enabled. However, we implemented guest filtering and is maintained in a separate tre= e https://github.com/rivosinc/linux/tree/cove-integration-device-filtering We did not include those in this series because the tdx patches are still undergoing development. We are planning to rebase our changes once the revised patches are available. > Is this the only mechanism by which the guest tells the TSM which parts > of the guest physical address space can be exposed to the host? > This is the current approach defined in CoVE spec. Guest informs about both shared memory & mmio regions via dedicated SBI calls ( e.g sbi_covg_[add/remove]_mmio_region and sbi_covg_[share/unshare]_memory_region) > For TDX and SEV, that information is inferred from a bit in the page > tables. Essentially, there are dedicated guest physical addresses that > tell the hardware how to treat the mappings: should the secure page > tables or the host's EPT/NPT be consulted? > Yes. We don't have such a mechanism defined in CoVE yet. Having said that, there is nothing in ISA to prevent that and it is doable. Some specific bits in the PTE entry can also be used to encode for shared & mmio physical memory addresses. The TSM implementation will probably need to implement a software page walker in that case. Are there any performance advantages between the two approaches ? As per my understanding, we are saving some boot time privilege transitions & less ABIs but adds the cost of software walk at runtime faults. > If that mechanism is different for RISC-V, it would go a long way to > explaining why RISC-V needs this patch. > > > In the current CoVE implementation, that MMIO region information is als= o > > passed to the host to provide additional flexibility. The host may > > choose to do additional > > sanity check and bail if the fault address does not belong to > > requested MMIO regions without > > going to the userspace. This is purely an optimization and may not be m= andatory. > > Makes sense, thanks for the explanation. > > >>> It can be a subset of the region's host provided the layout. The > >>> guest device filtering solution is based on this idea as well [1]. > >>> > >>> [1] https://lore.kernel.org/all/20210930010511.3387967-1-sathyanaraya= nan.kuppuswamy@linux.intel.com/ > >> > >> I don't really see the connection. Even if that series was going > >> forward (I'm not sure it is) there is no ioremap hook there. There's > >> also no guest->host communication in that series. The guest doesn't > >> _tell_ the host where the MMIO is, it just declines to run code for > >> devices that it didn't expect to see. > > > > This is a recent version of the above series from tdx github. This is > > a WIP as well and has not been posted to > > the mailing list. Thus, it may be going under revisions as well. > > As per my understanding the above ioremap changes for TDX mark the > > ioremapped pages as shared. > > The guest->host communication happen in the #VE exception handler > > where the guest converts this to a hypercall by invoking TDG.VP.VMCALL > > with an EPT violation set. The host would emulate an MMIO address if > > it gets an VMCALL with EPT violation. > > Please correct me if I am wrong. > > Yeah, TDX does: > > 1. Guest MMIO access > 2. Guest #VE handler (if the access faults) > 3. Guest hypercall->host > 4. Host fixes the fault > 5. Hypercall returns, guest returns from #VE via IRET > 6. Guest retries MMIO instruction > > From what you said, RISC-V appears to do: > > 1. Guest MMIO access > 2. Host MMIO handler > 3. Host handles the fault, returns > 4. Guest retries MMIO instruction > > In other words, this mechanism does the same thing but short-circuits > the trip through #VE and the hypercall. > Yes. Thanks for summarizing the tdx approach. > What happens if this ioremap() hook is not in place? Does the hardware > (or TSM) generate an exception like TDX gets? If so, it's probably > possible to move this "notify the TSM" code to that exception handler > instead of needing an ioremap() hook. > We don't have a #VE like exception mechanism in RISC-V. > I'm not saying that it's _better_ to do that, but it would allow you to > get rid of this patch for now and get me to shut up. :) > > > As I said above, the objective here is to notify the TSM where the > > MMIO is. Notifying the host is just an optimization that we choose to > > add. In fact, in this series the KVM code doesn't do anything with > > that information. The commit text probably can be improved to clarify > > that. > > Just to close the loop here, please go take a look at > pgprot_decrypted(). That's where the x86 guest page table bit gets to > tell the hardware that the mapping might cause a #VE and is under the > control of the host. That's the extent of what x86 does at ioremap() tim= e. > > So, to summarize, we have: > > x86: > 1. Guest page table bit to mark shared (host) vs. private (guest) > control > 2. #VE if there is a fault on a shared mapping to call into the host > > RISC-V: > 1. Guest->TSM call to mark MMIO vs. private > 2. Faults in the MMIO area are then transparent to the guest > Yup. This discussion raised a very valid design aspect of the CoVE spec. To summarize, we need to investigate whether using PTE bits instead of additional ABI for managing shared/confidential/ioremapped pages makes more sense. Thanks for putting up with my answers and the feedback :). > That design difference would, indeed, help explain why this patch is > here. I'm still not 100% convinced that the patch is *required*, but I > at least understand how we arrived here.