From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-8.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, MENTIONS_GIT_HOSTING,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5C1DCC47080 for ; Mon, 31 May 2021 20:07:07 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id CA7256127C for ; Mon, 31 May 2021 20:07:06 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org CA7256127C Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=shutemov.name Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 0FE938D0001; Mon, 31 May 2021 16:07:06 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 0AEE66B0070; Mon, 31 May 2021 16:07:06 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E432F8D0001; Mon, 31 May 2021 16:07:05 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0158.hostedemail.com [216.40.44.158]) by kanga.kvack.org (Postfix) with ESMTP id A7EB46B006C for ; Mon, 31 May 2021 16:07:05 -0400 (EDT) Received: from smtpin30.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id 3F462181AEF1E for ; Mon, 31 May 2021 20:07:05 +0000 (UTC) X-FDA: 78202610010.30.32341FA Received: from mail-lj1-f177.google.com (mail-lj1-f177.google.com [209.85.208.177]) by imf08.hostedemail.com (Postfix) with ESMTP id 8423E801935A for ; Mon, 31 May 2021 20:06:53 +0000 (UTC) Received: by mail-lj1-f177.google.com with SMTP id e2so16337634ljk.4 for ; Mon, 31 May 2021 13:07:04 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=shutemov-name.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=2PYA1pYVKDAI4+09LVa/AjMjJk2rf6grA0gDFLIFu60=; b=jMGB9HVWu7We5YKyc81ESVoYoaXoMM9Q5VcAD9ur8rFnJFXGQm8l3uQzQpe1w75++g EGGRMbNCSB+pUHPUaToaTLfJM9PSx/lfkp1TPuCum3/5XAfyo8vUnThe7wR/PSxhuNjX PLdCe68JHlRwsBMVUV3Yk91Q+xLlWtAed+KQNGnqrFnCqaOSL9WXfhm9HPl89rB9jgBT lXNrJVfdnevcf+ZuYu91HI/Oj90bR3leOdHh3fdpECSOLnwVcyXGSf5uEXwZlTvzQz5t D0DPjQKZY+R6QDOnDshhaidiSAo3n1Kal6Afo3rBJsdiDsqtiwCAsEyLQy5Cmiku20XH G6uw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=2PYA1pYVKDAI4+09LVa/AjMjJk2rf6grA0gDFLIFu60=; b=RLhBzYfEOkSa8QB1o7mF58RuJ8oRAWIgF1Y8IL7sy2Pwx2e10Bk/1j28WVz2kXVZ8C 5CmHBqlORZNXvL2DUa4WdoPzwzT3W1mX2mqxUS0T/lQV98jjg2t5sK0Zc16vAv1BvWh/ EnNtURV/JCeJOyNMfVrzf17660k8uZIkTETdL4r6SWrtZEPGV0KPtCentVuhcIZ6QXAM RfgbxuckLBXUFaVV1Ycmrozy0/QGaf+GYY0n/IPp+6iKgKqF+atRnzJ06pkAOIe4EnLV p031JaOv/ACpkZDwUEicBKFGkG1Z3J2FTBhrixcVO47Q1MQJAvDd9VkK5RkvsjQJ/dxX 1cUg== X-Gm-Message-State: AOAM531P3ELPxLMK7pzTCcRRyDX8Ci2qDiHrjEO6gN3CJ9tthZ0yoSGP PbQ+3ID6X1X6SLC5aKhp8NiPyg== X-Google-Smtp-Source: ABdhPJxMTxMJUUNo0WMO/JnVrApwkxYHhCSvQj5H71CpvqX3vnANi8jtHW+c4iUXlu8zYG8p9cfaFQ== X-Received: by 2002:a2e:7605:: with SMTP id r5mr18025459ljc.414.1622491623271; Mon, 31 May 2021 13:07:03 -0700 (PDT) Received: from box.localdomain ([86.57.175.117]) by smtp.gmail.com with ESMTPSA id f4sm1440832lfu.133.2021.05.31.13.07.02 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 31 May 2021 13:07:02 -0700 (PDT) Received: by box.localdomain (Postfix, from userid 1000) id 76C441027C1; Mon, 31 May 2021 23:07:12 +0300 (+03) Date: Mon, 31 May 2021 23:07:12 +0300 From: "Kirill A. Shutemov" To: Sean Christopherson Cc: "Kirill A. Shutemov" , Dave Hansen , Andy Lutomirski , Peter Zijlstra , Jim Mattson , David Rientjes , "Edgecombe, Rick P" , "Kleen, Andi" , "Yamahata, Isaku" , Erdem Aktas , Steve Rutherford , Peter Gonda , David Hildenbrand , Chao Peng , x86@kernel.org, kvm@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: Re: [RFCv2 13/13] KVM: unmap guest memory using poisoned pages Message-ID: <20210531200712.qjxghakcaj4s6ara@box.shutemov.name> References: <20210419142602.khjbzktk5tk5l6lk@box.shutemov.name> <20210419164027.dqiptkebhdt5cfmy@box.shutemov.name> <20210419185354.v3rgandtrel7bzjj@box> <20210419225755.nsrtjfvfcqscyb6m@box.shutemov.name> <20210521123148.a3t4uh4iezm6ax47@box> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Authentication-Results: imf08.hostedemail.com; dkim=pass header.d=shutemov-name.20150623.gappssmtp.com header.s=20150623 header.b=jMGB9HVW; spf=none (imf08.hostedemail.com: domain of kirill@shutemov.name has no SPF policy when checking 209.85.208.177) smtp.mailfrom=kirill@shutemov.name; dmarc=none X-Rspamd-Server: rspam01 X-Rspamd-Queue-Id: 8423E801935A X-Stat-Signature: jy3njpkhbkza33ibx5ebi6hfoysfn159 X-HE-Tag: 1622491613-887476 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Wed, May 26, 2021 at 07:46:52PM +0000, Sean Christopherson wrote: > On Fri, May 21, 2021, Kirill A. Shutemov wrote: > > Hi Sean, > > > > The core patch of the approach we've discussed before is below. It > > introduce a new page type with the required semantics. > > > > The full patchset can be found here: > > > > git://git.kernel.org/pub/scm/linux/kernel/git/kas/linux.git kvm-unmapped-guest-only > > > > but only the patch below is relevant for TDX. QEMU patch is attached. > > Can you post the whole series? I hoped to get it posted as part of TDX host enabling. As it is the feature is incomplete for pure KVM. I didn't implement on KVM side checks that provided by TDX module/hardware, so nothing prevents the same page to be added to multiple KVM instances. > The KVM behavior and usage of FOLL_GUEST is very relevant to TDX. The patch can be found here: https://git.kernel.org/pub/scm/linux/kernel/git/kas/linux.git/commit/?h=kvm-unmapped-guest-only&id=2cd6c2c20528696a46a2a59383ca81638bf856b5 > > CONFIG_HAVE_KVM_PROTECTED_MEMORY has to be changed to what is appropriate > > for TDX and FOLL_GUEST has to be used in hva_to_pfn_slow() when running > > TDX guest. > > This behavior in particular is relevant; KVM should provide FOLL_GUEST iff the > access is private or the VM type doesn't differentiate between private and > shared. I added FOL_GUEST if the KVM instance has the feature enabled. On top of that TDX-specific code has to check that the page is in fact PageGuest() before inserting it into private SEPT. The scheme makes sure that user-accessible memory cannot be not added as private to TD. > > When page get inserted into private sept we must make sure it is > > PageGuest() or SIGBUS otherwise. > > More KVM feedback :-) > > Ideally, KVM will synchronously exit to userspace with detailed information on > the bad behavior, not do SIGBUS. Hopefully that infrastructure will be in place > sooner than later. > > https://lkml.kernel.org/r/YKxJLcg/WomPE422@google.com My experiments are still v5.11, but I can rebase to whatever needed once the infrastructure hits upstream. > > Inserting PageGuest() into shared is fine, but the page will not be accessible > > from userspace. > > Even if it can be functionally fine, I don't think we want to allow KVM to map > PageGuest() as shared memory. The only reason to map memory shared is to share > it with something, e.g. the host, that doesn't have access to private memory, so > I can't envision a use case. > > On the KVM side, it's trivially easy to omit FOLL_GUEST for shared memory, while > always passing FOLL_GUEST would require manually zapping. Manual zapping isn't > a big deal, but I do think it can be avoided if userspace must either remap the > hva or define a new KVM memslot (new gpa->hva), both of which will automatically > zap any existing translations. > > Aha, thought of a concrete problem. If KVM maps PageGuest() into shared memory, > then KVM must ensure that the page is not mapped private via a different hva/gpa, > and is not mapped _any_ other guest because the TDX-Module's 1:1 PFN:TD+GPA > enforcement only applies to private memory. The explicit "VM_WRITE | VM_SHARED" > requirement below makes me think this wouldn't be prevented. Hm. I didn't realize that TDX module doesn't prevent the same page to be used as shared and private at the same time. Omitting FOLL_GUEST for shared memory doesn't look like a right approach. IIUC, it would require the kernel to track what memory is share and what private, which defeat the purpose of the rework. I would rather enforce !PageGuest() when share SEPT is populated in addition to enforcing PageGuest() fro private SEPT. Do you see any problems with this? > Oh, and the other nicety is that I think it would avoid having to explicitly > handle PageGuest() memory that is being accessed from kernel/KVM, i.e. if all > memory exposed to KVM must be !PageGuest(), then it is also eligible for > copy_{to,from}_user(). copy_{to,from}_user() enforce by setting PTE entries to PROT_NONE. Or do I miss your point? > > > Any feedback is welcome. > > > > -------------------------------8<------------------------------------------- > > > > From: "Kirill A. Shutemov" > > Date: Fri, 16 Apr 2021 01:30:48 +0300 > > Subject: [PATCH] mm: Introduce guest-only pages > > > > PageGuest() pages are only allowed to be used as guest memory. Userspace > > is not allowed read from or write to such pages. > > > > On page fault, PageGuest() pages produce PROT_NONE page table entries. > > Read or write there will trigger SIGBUS. Access to such pages via > > syscall leads to -EIO. > > > > The new mprotect(2) flag PROT_GUEST translates to VM_GUEST. Any page > > fault to VM_GUEST VMA produces PageGuest() page. > > > > Only shared tmpfs/shmem mappings are supported. > > Is limiting this to tmpfs/shmem only for the PoC/RFC, or is it also expected to > be the long-term behavior? I expect it to be enough to cover all relevant cases, no? Note that MAP_ANONYMOUS|MAP_SHARED also fits here. -- Kirill A. Shutemov