From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 04168C4332F for ; Thu, 22 Dec 2022 18:15:32 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 6091B900003; Thu, 22 Dec 2022 13:15:32 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 5B8D8900002; Thu, 22 Dec 2022 13:15:32 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 458FE900003; Thu, 22 Dec 2022 13:15:32 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 34B52900002 for ; Thu, 22 Dec 2022 13:15:32 -0500 (EST) Received: from smtpin21.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id DEBD040A29 for ; Thu, 22 Dec 2022 18:15:31 +0000 (UTC) X-FDA: 80270744862.21.2FFD4E3 Received: from mail-pj1-f53.google.com (mail-pj1-f53.google.com [209.85.216.53]) by imf13.hostedemail.com (Postfix) with ESMTP id E441D2000B for ; Thu, 22 Dec 2022 18:15:29 +0000 (UTC) Authentication-Results: imf13.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=mnWHipNR; spf=pass (imf13.hostedemail.com: domain of seanjc@google.com designates 209.85.216.53 as permitted sender) smtp.mailfrom=seanjc@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1671732930; a=rsa-sha256; cv=none; b=HChTrRr7Gd8ql1ozHAbpXhM39sJFeBBVR2xwIa1QA08GfdDvsuPMmidfQRYQXKumdObD+I Ph/zsyDelfWHZ+uEr9WncrIKz50Mpxv/LFqkg4VrLiVPhXsujMesOlOvSBSF7fo6I5bcJW jTNu9D7k1V3YyU8Xwi2/Wcn75/AvTIk= ARC-Authentication-Results: i=1; imf13.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=mnWHipNR; spf=pass (imf13.hostedemail.com: domain of seanjc@google.com designates 209.85.216.53 as permitted sender) smtp.mailfrom=seanjc@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1671732930; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=752C77UYWYC/kCXAv9m+o+bKyybany3JU3TCXHGZR0U=; b=libkciy/8MWGuOFP5QfcCv9VgqMl4oH1yWavxwSblkolvm+QeimcVaFnjdpEEBT5xCxkJK 5hiKWS1TXgAHLGA58z0BjZeMvCYqT6b2zYUegbQ2YWx10WtJoU2jei41TcHHUNJvD6GdKW ztO7Xev5XfOATRrVHg0097O/ZhxJjPw= Received: by mail-pj1-f53.google.com with SMTP id p4so2710000pjk.2 for ; Thu, 22 Dec 2022 10:15:29 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=752C77UYWYC/kCXAv9m+o+bKyybany3JU3TCXHGZR0U=; b=mnWHipNRICDDr9o9B9k9tzUHQ5DlJw7QNZU1NFFDGUJ/WFgmaKL6adljuLzxsRCR1T yOt1llvxRLddf3o8tFdD1XSrfvJKY91Y6BA+Ue/bxbYYHAufNWWHf+w2InX7LDglGQta 9K3BVA4lZMYr2iIsmJ+f1M3V5mNVtVOBrDe0He7TJ193jL7X5zGQlNAIk8two/do47Qa +IAmme0Ak9mW8lBakzH8LqHkhC04c8aYC5+Yadk9FKgybASwcKtR2kDtwpwTkfLtjN0F Pf78IoRu0kt98rlTKH1ZJdwA91Ht9u39VkBXmY2L8neh0zXKIViW79WjZdA8w13OMOlE OnLQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=752C77UYWYC/kCXAv9m+o+bKyybany3JU3TCXHGZR0U=; b=OzoJDa/cLGpF/uPIeCk1EGuQIIr/5qIEM9gJPBcLqivV5HCmMoQoLnYffkXNKQkoLb HCjbFzReVMBOQKIItitvJEtvc3etGqfR7S/MLPwyI+T8E9SPXD1wxzkmucFjhzdSSml2 L8+bW3HXFmjNMSAZ2IL60MAWXZeckGXG4wGYHXyfGRTMBDxkbgTb/4k5Essy7062LAhd C6YOFRNefsyAzqArr+GhhHe3i6FOq4DXcNG/EChPtiwMyOdwhuiyvmRlFElYww1B2MXi ExFimOvLMye2Ee17RrWY50Dq38r21FoyY+rCUmXSXb77+Mm7F6DoOinFP0Inj7NNznw3 j2kQ== X-Gm-Message-State: AFqh2kpamSlucA9ZPcnUON64XLx9+P3+46EmCPE2sBuycoAThBK235gY 3Nxr0CqUgr0j8HhaK96Lob3Dcw== X-Google-Smtp-Source: AMrXdXtVsGTXmJ+OudFvpboyWukOEY96QmhFL3/LDUsVn+qtHCwo04yMbEQyfv9hcz3t0HxxdVHCiA== X-Received: by 2002:a17:903:41ca:b0:189:6624:58c0 with SMTP id u10-20020a17090341ca00b00189662458c0mr1174380ple.3.1671732928592; Thu, 22 Dec 2022 10:15:28 -0800 (PST) Received: from google.com (7.104.168.34.bc.googleusercontent.com. [34.168.104.7]) by smtp.gmail.com with ESMTPSA id a7-20020a17090a70c700b00219feae9486sm3443216pjm.7.2022.12.22.10.15.28 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 22 Dec 2022 10:15:28 -0800 (PST) Date: Thu, 22 Dec 2022 18:15:24 +0000 From: Sean Christopherson To: Chao Peng Cc: "Huang, Kai" , "tglx@linutronix.de" , "linux-arch@vger.kernel.org" , "kvm@vger.kernel.org" , "jmattson@google.com" , "Lutomirski, Andy" , "ak@linux.intel.com" , "kirill.shutemov@linux.intel.com" , "Hocko, Michal" , "qemu-devel@nongnu.org" , "tabba@google.com" , "david@redhat.com" , "michael.roth@amd.com" , "corbet@lwn.net" , "bfields@fieldses.org" , "dhildenb@redhat.com" , "linux-kernel@vger.kernel.org" , "linux-fsdevel@vger.kernel.org" , "x86@kernel.org" , "bp@alien8.de" , "linux-api@vger.kernel.org" , "rppt@kernel.org" , "shuah@kernel.org" , "vkuznets@redhat.com" , "vbabka@suse.cz" , "mail@maciej.szmigiero.name" , "ddutile@redhat.com" , "qperret@google.com" , "arnd@arndb.de" , "pbonzini@redhat.com" , "vannapurve@google.com" , "naoya.horiguchi@nec.com" , "wanpengli@tencent.com" , "yu.c.zhang@linux.intel.com" , "hughd@google.com" , "aarcange@redhat.com" , "mingo@redhat.com" , "hpa@zytor.com" , "Nakajima, Jun" , "jlayton@kernel.org" , "joro@8bytes.org" , "linux-mm@kvack.org" , "Wang, Wei W" , "steven.price@arm.com" , "linux-doc@vger.kernel.org" , "Hansen, Dave" , "akpm@linux-foundation.org" , "linmiaohe@huawei.com" Subject: Re: [PATCH v10 1/9] mm: Introduce memfd_restricted system call to create restricted user memory Message-ID: References: <20221202061347.1070246-1-chao.p.peng@linux.intel.com> <20221202061347.1070246-2-chao.p.peng@linux.intel.com> <5c6e2e516f19b0a030eae9bf073d555c57ca1f21.camel@intel.com> <20221219075313.GB1691829@chaop.bj.intel.com> <20221220072228.GA1724933@chaop.bj.intel.com> <126046ce506df070d57e6fe5ab9c92cdaf4cf9b7.camel@intel.com> <20221221133905.GA1766136@chaop.bj.intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20221221133905.GA1766136@chaop.bj.intel.com> X-Rspam-User: X-Rspamd-Queue-Id: E441D2000B X-Rspamd-Server: rspam01 X-Stat-Signature: fp9zn1tanceyfgjgmsqb13thws7e8mb4 X-HE-Tag: 1671732929-944169 X-HE-Meta: U2FsdGVkX1+irvQPi51RUjUpP3keOP7iptWR5XGo8vTMLPmb6NEFHwPaW7pQcxaoQB0tSZwbPlygADIW+Qa0Xifq9Ie7R4WGstFIeRzUHOXam2W3Cn5i8TfHKaWWDw22qlb58Xj1u5OO4EJwt6wrfWa338sML6ZsImWiB4Eol+kWfDM0yhz0WD+Gj7BHJtUj3W8eo4VJgvSrRahi1LOYcrJqKkFrBkoCyPCqznuJavsckyyxcjsHFBYYBVmPbNtYpxnW5FhkKdEKQajSeOdN26v5sA/ZTbzUErrl4WZs7rH3yWJCvl37HoWb1TfYTAgGmcxI2ELpZZDtrPW1bhJ2Dnb+xcFg0aT+90OF+d8kdgjEBT6ZHMcc6WaLbTqrQnRwbeeyZ4ZkKqVx0A/tlsBNKJe/R7c88ooF+a2kq8lobYEJ8A0oU8UqAw8XW45zvpVcM1JTr14ILplLTizHIzbDV8nffINQB3T94EhikXYHugPCbQGJg2Bf4oHNYColKulRlsGGoHpBgkDR7ycjw9NPYvEN1M04GiL1xBxSFbDH3fB0oyH+5qA0tmy7WsgvBeNEOhweSFRII0C26zzHqBg0yoiUxb5MC61vPTksSnMJoLPxiWtbWDMMZLqy1QFsSc1zLSisurkZquzC6P1skdfrWkXucLKaDw1ZwpYgljLCr2QpUmyGjuAwWV7lvHyQ9LrJv4xNaNZ+i1k74AfCeCRnzjtTR+ZJM1CyXQZmBuwgCxqIyDU9XQl8ZH0yuM4eK3dy4Es92nWFXc/d+pMiQTwoDJNwSXVRRZBORZ0Fp+zFp6ODgpjQ8htCU0F/RaJ01wRZJmuniszQ9BqdSQHJhObKJmmec03X6TTE5bL1JwdpvI8YUUHANEdAaxjgV3NmGTi+UDkjm6X6iENO6BOWxPJq3nDAlctYtWjogS4uzgKPPWELUHMPnhtN9CjxKHmSPCFYyX9lrBpXhuLLAPFmTZK IfT4kPiV d9zqTu7/SPOqmMeOVhFsUCvokfHsz6//VMMbzv8eIhy+lmFSo/MY4UTzS2D5R9ta8/JZMLNg+sczcqmgrnkYlqaiddixURvY65+E2/7zuI8neSHTRQ76oMlfFB/sIcDJgzpGHgD0XGETEkSa3J2Yoi34Yu+bN3kOh8KIo3gApjqHwy1FPl32bCZJ/Tw== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Wed, Dec 21, 2022, Chao Peng wrote: > On Tue, Dec 20, 2022 at 08:33:05AM +0000, Huang, Kai wrote: > > On Tue, 2022-12-20 at 15:22 +0800, Chao Peng wrote: > > > On Mon, Dec 19, 2022 at 08:48:10AM +0000, Huang, Kai wrote: > > > > On Mon, 2022-12-19 at 15:53 +0800, Chao Peng wrote: > > But for non-restricted-mem case, it is correct for KVM to decrease page's > > refcount after setting up mapping in the secondary mmu, otherwise the page will > > be pinned by KVM for normal VM (since KVM uses GUP to get the page). > > That's true. Actually even true for restrictedmem case, most likely we > will still need the kvm_release_pfn_clean() for KVM generic code. On one > side, other restrictedmem users like pKVM may not require page pinning > at all. On the other side, see below. > > > > > So what we are expecting is: for KVM if the page comes from restricted mem, then > > KVM cannot decrease the refcount, otherwise for normal page via GUP KVM should. No, requiring the user (KVM) to guard against lack of support for page migration in restricted mem is a terrible API. It's totally fine for restricted mem to not support page migration until there's a use case, but punting the problem to KVM is not acceptable. Restricted mem itself doesn't yet support page migration, e.g. explosions would occur even if KVM wanted to allow migration since there is no notification to invalidate existing mappings. > I argue that this page pinning (or page migration prevention) is not > tied to where the page comes from, instead related to how the page will > be used. Whether the page is restrictedmem backed or GUP() backed, once > it's used by current version of TDX then the page pinning is needed. So > such page migration prevention is really TDX thing, even not KVM generic > thing (that's why I think we don't need change the existing logic of > kvm_release_pfn_clean()). Wouldn't better to let TDX code (or who > requires that) to increase/decrease the refcount when it populates/drops > the secure EPT entries? This is exactly what the current TDX code does: I agree that whether or not migration is supported should be controllable by the user, but I strongly disagree on punting refcount management to KVM (or TDX). The whole point of restricted mem is to support technologies like TDX and SNP, accomodating their special needs for things like page migration should be part of the API, not some footnote in the documenation. It's not difficult to let the user communicate support for page migration, e.g. if/when restricted mem gains support, add a hook to restrictedmem_notifier_ops to signal support (or lack thereof) for page migration. NULL == no migration, non-NULL == migration allowed. We know that supporting page migration in TDX and SNP is possible, and we know that page migration will require a dedicated API since the backing store can't memcpy() the page. I don't see any reason to ignore that eventuality. But again, unless I'm missing something, that's a future problem because restricted mem doesn't yet support page migration regardless of the downstream user.