From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0C3E8C636CC for ; Mon, 13 Feb 2023 14:23:43 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 3C5D86B0071; Mon, 13 Feb 2023 09:23:43 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 34EC46B0073; Mon, 13 Feb 2023 09:23:43 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 1C8BC6B0075; Mon, 13 Feb 2023 09:23:43 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 096996B0071 for ; Mon, 13 Feb 2023 09:23:43 -0500 (EST) Received: from smtpin11.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id BDD391206F6 for ; Mon, 13 Feb 2023 14:23:42 +0000 (UTC) X-FDA: 80462487084.11.10485C6 Received: from smtp-out2.suse.de (smtp-out2.suse.de [195.135.220.29]) by imf15.hostedemail.com (Postfix) with ESMTP id 1950EA0011 for ; Mon, 13 Feb 2023 14:23:38 +0000 (UTC) Authentication-Results: imf15.hostedemail.com; dkim=pass header.d=suse.cz header.s=susede2_rsa header.b=FejCCuo9; dkim=pass header.d=suse.cz header.s=susede2_ed25519 header.b=zQFUpB1c; spf=pass (imf15.hostedemail.com: domain of vbabka@suse.cz designates 195.135.220.29 as permitted sender) smtp.mailfrom=vbabka@suse.cz; dmarc=none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1676298219; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=xI8t880WoH5RIgX7MmHkHOMY1R2Mkd9PP0v7yUbxzrM=; b=7djqvRy/PN4pP1xJS0aTW8hNwAD6M0+DdAK+e4pCcVPePtZ1arAO8Yoy7EhGp+dNed2H2Q vbR14t5CtLgYSwIFVH3FLh4PUTTra5K8Gq3MRAznmFGcFi2IeSmJlYHhaN5Om45Lb9oQnX xDm7mjcESFwGal33xsDrKiK7y+QNS8E= ARC-Authentication-Results: i=1; imf15.hostedemail.com; dkim=pass header.d=suse.cz header.s=susede2_rsa header.b=FejCCuo9; dkim=pass header.d=suse.cz header.s=susede2_ed25519 header.b=zQFUpB1c; spf=pass (imf15.hostedemail.com: domain of vbabka@suse.cz designates 195.135.220.29 as permitted sender) smtp.mailfrom=vbabka@suse.cz; dmarc=none ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1676298219; a=rsa-sha256; cv=none; b=gxe39KdYLmgibAoppQBD7mjJJC1vZ0N87GDrmI5If7xxBh0+zusdLvB5/DdFOw2OaSCBXs ohtlkasSUwY+mCcC3mNDTpPOhQxCwhoZG00esgdm6bvIlu30ypATrSa8uSjFhajCvGZbM0 nz93gCiLKTav3cNOQVt/AXQ5J3WJhis= Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id C3E951F37F; Mon, 13 Feb 2023 14:23:37 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_rsa; t=1676298217; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=xI8t880WoH5RIgX7MmHkHOMY1R2Mkd9PP0v7yUbxzrM=; b=FejCCuo9F/1ynlgHsDBhIi9S5hH4MEbRMkX6V4fVllQfQz+Pvu43L3bCglZSBmTNDhNRJN 3aS3N4ba6rEBi/I6xjFdC2zQv6vgmregXBDMxAvs3F5TsBeXNNYCej7A4XuvZqcnE8ueLq Zk4CH/H5bLzeEW32rCyj1TjJVuWh16E= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_ed25519; t=1676298217; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=xI8t880WoH5RIgX7MmHkHOMY1R2Mkd9PP0v7yUbxzrM=; b=zQFUpB1curciG/HW2rtt77UABs/cJcnsU1I9qdyoFKrG5KNyvp9T4FGiOtq3iJMkfmY1iH 5B9gdsyOZJM5syAw== Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id 124AD138E6; Mon, 13 Feb 2023 14:23:37 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id 9NaxA+lH6mPvfgAAMHmgww (envelope-from ); Mon, 13 Feb 2023 14:23:37 +0000 Message-ID: <5d83c330-2697-b0a2-f55a-434b12bd81f8@suse.cz> Date: Mon, 13 Feb 2023 15:23:36 +0100 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.7.1 Subject: Re: [PATCH v10 1/9] mm: Introduce memfd_restricted system call to create restricted user memory Content-Language: en-US To: "Kirill A. Shutemov" Cc: "Huang, Kai" , "chao.p.peng@linux.intel.com" , "tglx@linutronix.de" , "linux-arch@vger.kernel.org" , "kvm@vger.kernel.org" , "jmattson@google.com" , "Hocko, Michal" , "pbonzini@redhat.com" , "ak@linux.intel.com" , "Lutomirski, Andy" , "linux-fsdevel@vger.kernel.org" , "tabba@google.com" , "david@redhat.com" , "michael.roth@amd.com" , "kirill.shutemov@linux.intel.com" , "corbet@lwn.net" , "qemu-devel@nongnu.org" , "dhildenb@redhat.com" , "bfields@fieldses.org" , "linux-kernel@vger.kernel.org" , "x86@kernel.org" , "bp@alien8.de" , "ddutile@redhat.com" , "rppt@kernel.org" , "shuah@kernel.org" , "vkuznets@redhat.com" , "mail@maciej.szmigiero.name" , "naoya.horiguchi@nec.com" , "qperret@google.com" , "arnd@arndb.de" , "linux-api@vger.kernel.org" , "yu.c.zhang@linux.intel.com" , "Christopherson,, Sean" , "wanpengli@tencent.com" , "vannapurve@google.com" , "hughd@google.com" , "aarcange@redhat.com" , "mingo@redhat.com" , "hpa@zytor.com" , "Nakajima, Jun" , "jlayton@kernel.org" , "joro@8bytes.org" , "linux-mm@kvack.org" , "Wang, Wei W" , "steven.price@arm.com" , "linux-doc@vger.kernel.org" , "Hansen, Dave" , "akpm@linux-foundation.org" , "linmiaohe@huawei.com" References: <20221202061347.1070246-1-chao.p.peng@linux.intel.com> <20221202061347.1070246-2-chao.p.peng@linux.intel.com> <5c6e2e516f19b0a030eae9bf073d555c57ca1f21.camel@intel.com> <20221219075313.GB1691829@chaop.bj.intel.com> <20221220072228.GA1724933@chaop.bj.intel.com> <126046ce506df070d57e6fe5ab9c92cdaf4cf9b7.camel@intel.com> <20221221133905.GA1766136@chaop.bj.intel.com> <010a330c-a4d5-9c1a-3212-f9107d1c5f4e@suse.cz> <20230123151803.lwbjug6fm45olmru@box> From: Vlastimil Babka In-Reply-To: <20230123151803.lwbjug6fm45olmru@box> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Rspam-User: X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: 1950EA0011 X-Stat-Signature: twarwi4raywtcufbxt87hp5mobtre147 X-HE-Tag: 1676298218-155421 X-HE-Meta: U2FsdGVkX18p9Coe2P/rgI9FPgHddXuqqXaTpE/M/cn4z0mRqaBS+GTO/hR3fzcjg513u6s/J4wWqIH0YOYfrq9wyi4VBvdQbJojfvQTJU0YRvqgOfpO+f4XwpyZr7nCkK2jzs4zylPKxigwUdixTdNh3Jo0qBK3MzqN4YjOsiVA2aTAra373sJ+rjpGN1W7NBNbqDpIKLG2067Ebqpx6tpOiaLTHFu98hsPQ/00CkEPYJvangtQUllPY7hbtU9cgSnOQEB2+CREKROfT5UC7iCznUGBwcA7ImU+PXoQJiK+yttzLChmuq4iFZCwaINlbDJEe3D4fPV1xEr3zdNiMvc/icdBvkICgpzJUURcDmQcFbVTI4vudbtzPpjgMhBmq0ha8C48VpngtHmfdwNmXr2pe/kEq1XxCQY6uBw2v+7gd2P2beHBHccrG1X9mvDGqHzqL8xF7HrkeAZL0tqe3dJiG1qOF+nM+P2XF7S+vvKx6PZE9pIIBcW+yvdab+1U/SrCpNAVte91r6fvGpLAh9IDl5QIgTu4ZqH3ZI/jLoIzyi0OxsHe07f87zjzlMUo0cNfU1O5JSPqFX/KbtScEzYksySGSxmz/MZPZR2Rg45G0o9Lbu2PT66Kjhb7ZCZvuNaIwpSNLfjcSHuBkhjBtERhwkvyur8AoMWkpVGp6kdx7kDsWKyYCacZZR0fnDoGZHneRUvMtldMCX5UAtW9ZRNsCCoSeHOzrkzbihVvHw3apfQUu/tOgDtIXGkJ3Lw7odhvbabOZMYqaoCvw2fx/FL1LlpX1vH5nntrNZhtufV2OfCoEmZVR51geApFyb3nKjTI30sQkWR0kkOtNzyhzSg7PLF+23Wj2984JExBeVBpIJv72xntBPH3Ew+DfE+wbmUlRmqqvU/M9QYBY+LaGo8V1FUcQkM7cJ8BrS6zgXPSD9KB3zE3+CTR976ovL0TXQJWm947JGy1kQ1E8XM r5fxfYfg gFt1WnAHdS9MQPsat5q3rgsZUHCnABd2h9g1wdc55bOAxwk59s0NjJ8sM23lvChHVYv20Tt5p3WoJbkbzQAwj3B0b/XS6IZTiBHaboGijG95Sh1IVcs4Zv7fan+dPq84lYyjfR2l6iyUqJ+n/3UajxAuPL0A8Ya5+bJHryTbfZTLDEM/iRLP/iBREdwsgSgeD4TwMe8SXPcOHek81oDxAFptvEC2Co2iKvXLGSEocQuZHzcgJXV3AKI1jmlQlXAryQ9fsGlZdQwjpmxv3Z10H040Btrn+ozRPkO9iHD1VmzMWoZabieKY7CIVSw== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 1/23/23 16:18, Kirill A. Shutemov wrote: > On Mon, Jan 23, 2023 at 03:03:45PM +0100, Vlastimil Babka wrote: >> On 12/22/22 01:37, Huang, Kai wrote: >> >>> I argue that this page pinning (or page migration prevention) is not >> >>> tied to where the page comes from, instead related to how the page will >> >>> be used. Whether the page is restrictedmem backed or GUP() backed, once >> >>> it's used by current version of TDX then the page pinning is needed. So >> >>> such page migration prevention is really TDX thing, even not KVM generic >> >>> thing (that's why I think we don't need change the existing logic of >> >>> kvm_release_pfn_clean()).  >> >>> >> > This essentially boils down to who "owns" page migration handling, and sadly, >> > page migration is kinda "owned" by the core-kernel, i.e. KVM cannot handle page >> > migration by itself -- it's just a passive receiver. >> > >> > For normal pages, page migration is totally done by the core-kernel (i.e. it >> > unmaps page from VMA, allocates a new page, and uses migrate_pape() or a_ops- >> >> migrate_page() to actually migrate the page). >> > In the sense of TDX, conceptually it should be done in the same way. The more >> > important thing is: yes KVM can use get_page() to prevent page migration, but >> > when KVM wants to support it, KVM cannot just remove get_page(), as the core- >> > kernel will still just do migrate_page() which won't work for TDX (given >> > restricted_memfd doesn't have a_ops->migrate_page() implemented). >> > >> > So I think the restricted_memfd filesystem should own page migration handling, >> > (i.e. by implementing a_ops->migrate_page() to either just reject page migration >> > or somehow support it). >> >> While this thread seems to be settled on refcounts already, just wanted >> to point out that it wouldn't be ideal to prevent migrations by >> a_ops->migrate_page() rejecting them. It would mean cputime wasted (i.e. >> by memory compaction) by isolating the pages for migration and then >> releasing them after the callback rejects it (at least we wouldn't waste >> time creating and undoing migration entries in the userspace page tables >> as there's no mmap). Elevated refcount on the other hand is detected >> very early in compaction so no isolation is attempted, so from that >> aspect it's optimal. > > Hm. Do we need a new hook in a_ops to check if the page is migratable > before going with longer path to migrate_page(). > > Or maybe add AS_UNMOVABLE? AS_UNMOVABLE should indeed allow a test in e.g. compaction to descide that the page is not worth isolating in the first place.