From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8E53CC021B8 for ; Wed, 26 Feb 2025 15:15:28 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id D35E3280008; Wed, 26 Feb 2025 10:15:27 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id CBEB5280003; Wed, 26 Feb 2025 10:15:27 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B396C280008; Wed, 26 Feb 2025 10:15:27 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 85144280003 for ; Wed, 26 Feb 2025 10:15:27 -0500 (EST) Received: from smtpin24.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 51270C2223 for ; Wed, 26 Feb 2025 15:14:32 +0000 (UTC) X-FDA: 83162442384.24.AE75255 Received: from smtp-fw-80007.amazon.com (smtp-fw-80007.amazon.com [99.78.197.218]) by imf10.hostedemail.com (Postfix) with ESMTP id 1B6CDC0017 for ; Wed, 26 Feb 2025 15:14:29 +0000 (UTC) Authentication-Results: imf10.hostedemail.com; dkim=pass header.d=amazon.co.uk header.s=amazon201209 header.b=YbeicyAy; spf=pass (imf10.hostedemail.com: domain of "prvs=14584be8e=roypat@amazon.co.uk" designates 99.78.197.218 as permitted sender) smtp.mailfrom="prvs=14584be8e=roypat@amazon.co.uk"; dmarc=pass (policy=quarantine) header.from=amazon.co.uk ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1740582870; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=wolJGYzNbXbPiDgE8dqL6KYG3P554mNNvwNzwm6iLVM=; b=mIIJhNFxjlUCu/aXwmqFTBbz5x6/2GseZgOOZEwCyYXXMsA7f6ul5645WXYzPWjmCcws2P CqXCAIu/wMY6Gtif8viP6hQ2dEm5TKVOPBkHozWL3ZpfANGrLll/gt8FHs6Lp13C89IPb6 YerDb4uVwddQon5h3q2n7bJ9ngZ9PpI= ARC-Authentication-Results: i=1; imf10.hostedemail.com; dkim=pass header.d=amazon.co.uk header.s=amazon201209 header.b=YbeicyAy; spf=pass (imf10.hostedemail.com: domain of "prvs=14584be8e=roypat@amazon.co.uk" designates 99.78.197.218 as permitted sender) smtp.mailfrom="prvs=14584be8e=roypat@amazon.co.uk"; dmarc=pass (policy=quarantine) header.from=amazon.co.uk ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1740582870; a=rsa-sha256; cv=none; b=PhdOau827y2yga0RY42I7UtvYy7Q9H3qw6YoidOA2RACvtrCNpob4mKmphFk1L20FENnho e6DRbUp/8XrnmRDZjhQ9s+5ogklDors8ga5p6+vCOBIKwq+HtRExEoXVtz6zLh/kRDJykG GnKupyV+6o2+7rb3az5JzxeJzUudb2E= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amazon.co.uk; i=@amazon.co.uk; q=dns/txt; s=amazon201209; t=1740582871; x=1772118871; h=message-id:date:mime-version:subject:to:cc:references: from:in-reply-to:content-transfer-encoding; bh=wolJGYzNbXbPiDgE8dqL6KYG3P554mNNvwNzwm6iLVM=; b=YbeicyAykqpzODWs2nVA8U89cBmhZ4VIsqanGWV0rtjsyY8Sf8qZJ7+t iROvRGE7XZ/+pTvo12vdoHBfZVidnqSMNWEOgEEbfsnSgGS1evzqfwTUE sjAFHj6OhdLAe21yzGiZJOneknLikRRJba4rH2O5wPGmkEulJTfMANZgP 8=; X-IronPort-AV: E=Sophos;i="6.13,317,1732579200"; d="scan'208";a="381035687" Received: from pdx4-co-svc-p1-lb2-vlan2.amazon.com (HELO smtpout.prod.us-west-2.prod.farcaster.email.amazon.dev) ([10.25.36.210]) by smtp-border-fw-80007.pdx80.corp.amazon.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 26 Feb 2025 15:14:24 +0000 Received: from EX19MTAUWA001.ant.amazon.com [10.0.7.35:11919] by smtpin.naws.us-west-2.prod.farcaster.email.amazon.dev [10.0.35.58:2525] with esmtp (Farcaster) id 044ff674-0f38-4827-8439-56d357ec9ac7; Wed, 26 Feb 2025 15:14:23 +0000 (UTC) X-Farcaster-Flow-ID: 044ff674-0f38-4827-8439-56d357ec9ac7 Received: from EX19D003UWB002.ant.amazon.com (10.13.138.11) by EX19MTAUWA001.ant.amazon.com (10.250.64.204) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA) id 15.2.1544.14; Wed, 26 Feb 2025 15:14:22 +0000 Received: from EX19MTAUWC001.ant.amazon.com (10.250.64.145) by EX19D003UWB002.ant.amazon.com (10.13.138.11) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA) id 15.2.1544.14; Wed, 26 Feb 2025 15:14:22 +0000 Received: from email-imr-corp-prod-pdx-all-2b-c1559d0e.us-west-2.amazon.com (10.25.36.210) by mail-relay.amazon.com (10.250.64.145) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA) id 15.2.1544.14 via Frontend Transport; Wed, 26 Feb 2025 15:14:22 +0000 Received: from [127.0.0.1] (dev-dsk-roypat-1c-dbe2a224.eu-west-1.amazon.com [172.19.88.180]) by email-imr-corp-prod-pdx-all-2b-c1559d0e.us-west-2.amazon.com (Postfix) with ESMTPS id 589A640496; Wed, 26 Feb 2025 15:14:15 +0000 (UTC) Message-ID: <7f38018b-dc89-4d79-a309-149557796121@amazon.co.uk> Date: Wed, 26 Feb 2025 15:14:14 +0000 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH v4 03/12] KVM: guest_memfd: Add flag to remove from direct map To: David Hildenbrand , , CC: , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , References: <20250221160728.1584559-1-roypat@amazon.co.uk> <20250221160728.1584559-4-roypat@amazon.co.uk> <8642de57-553a-47ec-81af-803280a360ec@amazon.co.uk> From: Patrick Roy Content-Language: en-US Autocrypt: addr=roypat@amazon.co.uk; keydata= xjMEY0UgYhYJKwYBBAHaRw8BAQdA7lj+ADr5b96qBcdINFVJSOg8RGtKthL5x77F2ABMh4PN NVBhdHJpY2sgUm95IChHaXRodWIga2V5IGFtYXpvbikgPHJveXBhdEBhbWF6b24uY28udWs+ wpMEExYKADsWIQQ5DAcjaM+IvmZPLohVg4tqeAbEAgUCY0UgYgIbAwULCQgHAgIiAgYVCgkI CwIEFgIDAQIeBwIXgAAKCRBVg4tqeAbEAmQKAQC1jMl/KT9pQHEdALF7SA1iJ9tpA5ppl1J9 AOIP7Nr9SwD/fvIWkq0QDnq69eK7HqW14CA7AToCF6NBqZ8r7ksi+QLOOARjRSBiEgorBgEE AZdVAQUBAQdAqoMhGmiXJ3DMGeXrlaDA+v/aF/ah7ARbFV4ukHyz+CkDAQgHwngEGBYKACAW IQQ5DAcjaM+IvmZPLohVg4tqeAbEAgUCY0UgYgIbDAAKCRBVg4tqeAbEAtjHAQDkh5jZRIsZ 7JMNkPMSCd5PuSy0/Gdx8LGgsxxPMZwePgEAn5Tnh4fVbf00esnoK588bYQgJBioXtuXhtom 8hlxFQM= In-Reply-To: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 7bit X-Rspam-User: X-Rspamd-Server: rspam11 X-Rspamd-Queue-Id: 1B6CDC0017 X-Stat-Signature: ridhphjsj7hyo7kuo38zaj73susnnr6i X-HE-Tag: 1740582869-786734 X-HE-Meta: U2FsdGVkX1/hRkavZXQks4VHqgkmf+CHfbzAQ1kGnGCnMvqT3HKzopX2UqN/VxzLv+E38tRNJNrjSnb2dtbP9dhz5sQIsMVrL8CmV0I7bAINFFhJpxFIPHtEOn2uHbNOG+g8LzgoBlr0S00pM1Gym2kD2X4/kM6L7XkKtNHUSvMyGyut4ApAWMhEfbj1CTOMwx6h6C6bCJAK4PO7wyDSiy67XANFOT6jqfuKBF4lKol9xrg5BfSluRUTlMso07httvJCniJloVfhF1sDa9lNNdIErsxbF2wWdCZ3Bp+9fsBtW5pk4/tg/oCMgukmI1Pc9haDey4TcnUcC1AavtLiT8uSmcZ4UO7tO0gK3VgGbncK2EZSOYrffVK4SBQjYWgDbiJuzkYKiUbS1QZfYQPib67mEz4d2dt03NGlqCzfZqUqOIm6X5w37zPGIuYhs6MsmHHWL23XOLLawMl+nbIfVOihYSDpL/wgGjo1PsuyfbIprsh2JbZeaXTruzh3LDklr5kuWyKPWfFNKYFljVwV47mcoIG3I/w9dryNe/dnaS7llKUEky7/9TgYSHcaaR5qbYt40gFfCnacvpINOhwxW2CpuDBMCK8ZLFq0YaANzqVmuXZLsvBsGrOnjlMNaWsB/N/G/LZvDzKpAr7c+wLeZcjn92ygI1NGs1S8hC20W/oTTcgi1EoBOq2XwCAmCeFLLmHxY1sh8jVwzSopiFHuJQvrChLj6aTvgohmCrGXkmMks8whQKXY2W9VTB6kpOi3qwXdZUfO2mNh44Fz1TCS1XOMiov6N5syO4lQMCGAHPkkXiUGY48O3EXnC/4chySd/X3Ck0oeRDqj6QgrIjal851y3kp0H0dzH4BsCkfyscvTTrwA4KXUBFSqW46h77HirywjxjvIL3GU5AoZ2jqOXtbXpoLJbcsLvMrY54OrVcRJ5OKJEutW9FFjOrVAKRiOwzM+yOnKyWGNnzeJ3wW qXgrGXq0 4ZAQ2ZY6FlxKecvCQh40GX3G9MGzb2UJXT3Rqp/Zoy7AcpObziWZd34P9HqR7Mv7vHpvq0zrDSQjthI/SLZ0PzayMJSgJb/MRX/a+UCyWfNFTrq9GBlTunyOikkEqwujuTuxmjQ+LYTnsTBOHSCyblYp2nF7ch4OmqqdSJCvmsaNWowcrExGOImNzTw8TNwb9So4tu3mHZeCytJX7Lw+mmhwqoyLjlSUgr4/i7BWwGteItjnpD1BMfPX1oLfcR7SJAAdVs2usVwGHLNkK1JUqQ5M+BNnwQIRE/sxqqMzwQp1K2aAU3F2HY9sVdX83NatpiPsbmWkM5nHWPpdrSWPnyBLkWEf4p4dSW+U0IwUlxvTxcPDpumY9LK9MSWEmFsJLO9mDkaf7QeGt0FzFOTOGA/Seq/SrlTpkPEUfzeYsybxxiKofomwC6mLc6+sbjp9Iy6uvFbDs0LHO2EwaqwoWoO0pUifU3ADxkJmJ1l2kMFOYOOI4wAAqp5vMqPuK4/JUd7R5l9xpBc9hBiF/LuYicMKDoS0AkzD6h4r+Fd6RhTJAcZTAx6PFOx9a0xkkUNBvQm8OgqfPYakwSxB48UIHirgBKGsa9xXKZQrEQ6ixjynuDq1slpyHZMUA1KU9z/sCv0mDiyfdmT7iZDvPvKVb6YJQYTsfwKlP3THdVITK2qADdiyVIHhon4A1cA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Wed, 2025-02-26 at 09:08 +0000, David Hildenbrand wrote: > On 26.02.25 09:48, Patrick Roy wrote: >> >> >> On Tue, 2025-02-25 at 16:54 +0000, David Hildenbrand wrote:> On 21.02.25 17:07, Patrick Roy wrote: >>>> Add KVM_GMEM_NO_DIRECT_MAP flag for KVM_CREATE_GUEST_MEMFD() ioctl. When >>>> set, guest_memfd folios will be removed from the direct map after >>>> preparation, with direct map entries only restored when the folios are >>>> freed. >>>> >>>> To ensure these folios do not end up in places where the kernel cannot >>>> deal with them, set AS_NO_DIRECT_MAP on the guest_memfd's struct >>>> address_space if KVM_GMEM_NO_DIRECT_MAP is requested. >>>> >>>> Note that this flag causes removal of direct map entries for all >>>> guest_memfd folios independent of whether they are "shared" or "private" >>>> (although current guest_memfd only supports either all folios in the >>>> "shared" state, or all folios in the "private" state if >>>> !IS_ENABLED(CONFIG_KVM_GMEM_SHARED_MEM)). The usecase for removing >>>> direct map entries of also the shared parts of guest_memfd are a special >>>> type of non-CoCo VM where, host userspace is trusted to have access to >>>> all of guest memory, but where Spectre-style transient execution attacks >>>> through the host kernel's direct map should still be mitigated. >>>> >>>> Note that KVM retains access to guest memory via userspace >>>> mappings of guest_memfd, which are reflected back into KVM's memslots >>>> via userspace_addr. This is needed for things like MMIO emulation on >>>> x86_64 to work. Previous iterations attempted to instead have KVM >>>> temporarily restore direct map entries whenever such an access to guest >>>> memory was needed, but this turned out to have a significant performance >>>> impact, as well as additional complexity due to needing to refcount >>>> direct map reinsertion operations and making them play nicely with gmem >>>> truncations. >>>> >>>> This iteration also doesn't have KVM perform TLB flushes after direct >>>> map manipulations. This is because TLB flushes resulted in a up to 40x >>>> elongation of page faults in guest_memfd (scaling with the number of CPU >>>> cores), or a 5x elongation of memory population. On the one hand, TLB >>>> flushes are not needed for functional correctness (the virt->phys >>>> mapping technically stays "correct", the kernel should simply to not it >>>> for a while), so this is a correct optimization to make. On the other >>>> hand, it means that the desired protection from Spectre-style attacks is >>>> not perfect, as an attacker could try to prevent a stale TLB entry from >>>> getting evicted, keeping it alive until the page it refers to is used by >>>> the guest for some sensitive data, and then targeting it using a >>>> spectre-gadget. >>>> >>>> Signed-off-by: Patrick Roy >>> >>> ... >>> >>>> >>>> +static bool kvm_gmem_test_no_direct_map(struct inode *inode) >>>> +{ >>>> + return ((unsigned long) inode->i_private) & KVM_GMEM_NO_DIRECT_MAP; >>>> +} >>>> + >>>> static inline void kvm_gmem_mark_prepared(struct folio *folio) >>>> { >>>> + struct inode *inode = folio_inode(folio); >>>> + >>>> + if (kvm_gmem_test_no_direct_map(inode)) { >>>> + int r = set_direct_map_valid_noflush(folio_page(folio, 0), folio_nr_pages(folio), >>>> + false); >>> >>> Will this work if KVM is built as a module, or is this another good >>> reason why we might want guest_memfd core part of core-mm? >> >> mh, I'm admittedly not too familiar with the differences that would come >> from building KVM as a module vs not. I do remember something about the >> direct map accessors not being available for modules, so this would >> indeed not work. Does that mean moving gmem into core-mm will be a >> pre-requisite for the direct map removal stuff? > > Likely, we'd need some shim. > > Maybe for the time being it could be fenced using #if IS_BUILTIN() ... > but that sure won't win in a beauty contest. Is anyone working on such a shim at the moment? Otherwise, would it make sense for me to look into it? (although I'll probably need a pointer or two for what is actually needed) I saw your comment on Fuad's series [1] indicating that he'll also need some shim, so probably makes sense to tackle it anyway instead of hacking around it with #if-ery. [1]: https://lore.kernel.org/kvm/8ddab670-8416-47f2-a5a6-94fb3444f328@redhat.com/ > -- > Cheers, > > David / dhildenb > Best, Patrick