From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id B147FC28B28 for ; Wed, 19 Mar 2025 07:53:59 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 3FCD1280002; Wed, 19 Mar 2025 03:53:58 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 3839F280001; Wed, 19 Mar 2025 03:53:58 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 1D73D280002; Wed, 19 Mar 2025 03:53:58 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id F05AD280001 for ; Wed, 19 Mar 2025 03:53:57 -0400 (EDT) Received: from smtpin06.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 74147161BA0 for ; Wed, 19 Mar 2025 07:53:58 +0000 (UTC) X-FDA: 83237536956.06.97D3F38 Received: from smtp-fw-6001.amazon.com (smtp-fw-6001.amazon.com [52.95.48.154]) by imf27.hostedemail.com (Postfix) with ESMTP id 439D240002 for ; Wed, 19 Mar 2025 07:53:56 +0000 (UTC) Authentication-Results: imf27.hostedemail.com; dkim=pass header.d=amazon.co.uk header.s=amazon201209 header.b=lWtnx9E4; dmarc=pass (policy=quarantine) header.from=amazon.co.uk; spf=pass (imf27.hostedemail.com: domain of "prvs=166da1a61=roypat@amazon.co.uk" designates 52.95.48.154 as permitted sender) smtp.mailfrom="prvs=166da1a61=roypat@amazon.co.uk" ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1742370836; a=rsa-sha256; cv=none; b=JkLtjbARHoyhFDWcWcBmtOa1rkXAViQudDNEn7HsQ5uvcG/mQgOC8H/hbktMFI9q3xyOX1 lLviXp6n0wl4ZcwMABt1fZ61LzjVdwFS7/QjKPR92EMUY++Y4NOq+Ye94kxD9R0E3iL0T3 V2zoItoq/Bvp5PCdsot5mPAmIyiAK9c= ARC-Authentication-Results: i=1; imf27.hostedemail.com; dkim=pass header.d=amazon.co.uk header.s=amazon201209 header.b=lWtnx9E4; dmarc=pass (policy=quarantine) header.from=amazon.co.uk; spf=pass (imf27.hostedemail.com: domain of "prvs=166da1a61=roypat@amazon.co.uk" designates 52.95.48.154 as permitted sender) smtp.mailfrom="prvs=166da1a61=roypat@amazon.co.uk" ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1742370836; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=jXl/8vzx6TOy29mjcSY/yyQLEaXHaFbsfFs77FD/cV4=; b=keSnMfXi/mu0wyguMsFU7uQydumXD/ADRGb2U9dwDDwVBPvgdGCuM1Bda0CqtR3uXnK3eQ UdeZHTQLNXHn8I1ev8OnIFncSWkjkTH0k6TmCp/lmf14W/IvnpH16phJYinepYExyBsaYA LSwlagF1XrACz1/YCRrxDBaJmyJzb54= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amazon.co.uk; i=@amazon.co.uk; q=dns/txt; s=amazon201209; t=1742370837; x=1773906837; h=message-id:date:mime-version:subject:to:cc:references: from:in-reply-to:content-transfer-encoding; bh=jXl/8vzx6TOy29mjcSY/yyQLEaXHaFbsfFs77FD/cV4=; b=lWtnx9E4XSgucQOROvSQLf+tbRbPSxY1JZnZBtk59J5mZ1vlQYiBAgKe NZ81lfCNyXnibyLsT/sY4iq59f99RRU7gVohyQ5aCHUhbF2jGkYOMxzAZ RSz0MTE5ZRyik9j5yik9D9CRdj+RxpHqa8QS16umAmbNIGdNzeB/kZJ0C g=; X-IronPort-AV: E=Sophos;i="6.14,259,1736812800"; d="scan'208";a="472339262" Received: from iad12-co-svc-p1-lb1-vlan2.amazon.com (HELO smtpout.prod.us-west-2.prod.farcaster.email.amazon.dev) ([10.43.8.2]) by smtp-border-fw-6001.iad6.amazon.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 19 Mar 2025 07:53:51 +0000 Received: from EX19MTAUWC002.ant.amazon.com [10.0.7.35:10937] by smtpin.naws.us-west-2.prod.farcaster.email.amazon.dev [10.0.62.245:2525] with esmtp (Farcaster) id f1034041-96c5-4fa3-9e32-0aaec99408b2; Wed, 19 Mar 2025 07:53:50 +0000 (UTC) X-Farcaster-Flow-ID: f1034041-96c5-4fa3-9e32-0aaec99408b2 Received: from EX19D020UWA001.ant.amazon.com (10.13.138.249) by EX19MTAUWC002.ant.amazon.com (10.250.64.143) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA) id 15.2.1544.14; Wed, 19 Mar 2025 07:53:50 +0000 Received: from EX19MTAUEC002.ant.amazon.com (10.252.135.146) by EX19D020UWA001.ant.amazon.com (10.13.138.249) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA) id 15.2.1544.14; Wed, 19 Mar 2025 07:53:49 +0000 Received: from email-imr-corp-prod-iad-all-1a-f1af3bd3.us-east-1.amazon.com (10.43.8.6) by mail-relay.amazon.com (10.252.135.146) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA) id 15.2.1544.14 via Frontend Transport; Wed, 19 Mar 2025 07:53:49 +0000 Received: from [127.0.0.1] (dev-dsk-roypat-1c-dbe2a224.eu-west-1.amazon.com [172.19.88.180]) by email-imr-corp-prod-iad-all-1a-f1af3bd3.us-east-1.amazon.com (Postfix) with ESMTPS id 5C81F40881; Wed, 19 Mar 2025 07:53:45 +0000 (UTC) Message-ID: Date: Wed, 19 Mar 2025 07:53:43 +0000 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH v4 03/12] KVM: guest_memfd: Add flag to remove from direct map To: David Hildenbrand , , CC: , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , Elliot Berman References: <20250221160728.1584559-1-roypat@amazon.co.uk> <20250221160728.1584559-4-roypat@amazon.co.uk> <8642de57-553a-47ec-81af-803280a360ec@amazon.co.uk> <7f38018b-dc89-4d79-a309-149557796121@amazon.co.uk> <9ffce724-23c9-4aa1-bc53-8292e1029991@redhat.com> From: Patrick Roy Content-Language: en-US Autocrypt: addr=roypat@amazon.co.uk; keydata= xjMEY0UgYhYJKwYBBAHaRw8BAQdA7lj+ADr5b96qBcdINFVJSOg8RGtKthL5x77F2ABMh4PN NVBhdHJpY2sgUm95IChHaXRodWIga2V5IGFtYXpvbikgPHJveXBhdEBhbWF6b24uY28udWs+ wpMEExYKADsWIQQ5DAcjaM+IvmZPLohVg4tqeAbEAgUCY0UgYgIbAwULCQgHAgIiAgYVCgkI CwIEFgIDAQIeBwIXgAAKCRBVg4tqeAbEAmQKAQC1jMl/KT9pQHEdALF7SA1iJ9tpA5ppl1J9 AOIP7Nr9SwD/fvIWkq0QDnq69eK7HqW14CA7AToCF6NBqZ8r7ksi+QLOOARjRSBiEgorBgEE AZdVAQUBAQdAqoMhGmiXJ3DMGeXrlaDA+v/aF/ah7ARbFV4ukHyz+CkDAQgHwngEGBYKACAW IQQ5DAcjaM+IvmZPLohVg4tqeAbEAgUCY0UgYgIbDAAKCRBVg4tqeAbEAtjHAQDkh5jZRIsZ 7JMNkPMSCd5PuSy0/Gdx8LGgsxxPMZwePgEAn5Tnh4fVbf00esnoK588bYQgJBioXtuXhtom 8hlxFQM= In-Reply-To: <9ffce724-23c9-4aa1-bc53-8292e1029991@redhat.com> Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 7bit X-Rspam-User: X-Rspamd-Queue-Id: 439D240002 X-Rspamd-Server: rspam05 X-Stat-Signature: ngy97e1rfxn5h5d7r79i36g9iatrojnp X-HE-Tag: 1742370836-321546 X-HE-Meta: U2FsdGVkX1+BwvgHtcKVtAq5LBcFfK+OW72FbpZodUKj/HAjr+rWxfxo1dvrys0P7r0Z62ADYPEhyJ/R3NGhHvw68kPiN4vEHw8OM7vHwyyac8ZlX+bNDJhPKbojw4n6lsFzzTFWwbYy4VQjxLgTvYCfnuhr9T35dMbsxiYrbG4RnHsvMtpD4EbxVb8SG1y1x+dYC7+FIr6yLgRPUrj7+PClyyqgEE4I0qz6/U85uObqx2qsNQXPC9EfMwlpNS/WRnzxfivs2DHuirUkqKQGoY2WXdwtZenyj+122Gin6QFzvF7ya3FSJY31yzIV2PSo8lB6i5BPGoB9MveC8ywGXDE8d+PmK1otVri20H4PYfaDYvezU5JYO1OjG2Tw/s+q1fE52EQ0NKDNmex9zSahLrjd+zpIUMg/wBFB9OY/58ex0bpZIwd1T0S0pbgTRzTxsJD3pB0lUgyGTGT4XwipF7lkBpZSaK9XbV+gqvAUorFGQyYOVe5GRP73XCSQkDtYcayg2tNjTWfqn7aPX12+4kviicCwGxzh0xuGhYhS/BSHI8BHodRCoLZUs33t0yzhEUiWwBvFT26grzYUTov3n/OuH1S9oldQoeNcP8YkO3XX5vUrzdxBKjXY/W8NhZFfrLMF0W3l7zDSeDHnv2kXZMgxQeNck8NyJsgieDM7Xf6qOBPrT0763MwmkauZZlYI6yuDv8NaA4t7ya6zrt7WCBrPYkAJnRRuv+DaB7hnZagW3/GsShPKV33chfqu3QsjDvNo6qMjURYkutqXr4rZ+q0eD+1u38payFiUlPo2dR2NRRh/rEz6qAiBWNH0BQT7dYk3N8K8VNJNMhTQ5JRsgotN5kiBGWWQmPAqUuRA+4ccWhfmYsC1UnN87W2KvsM4ZVmodariFWp9O3Vf775K4h1/yZVl/n/841GysFeNfP3fmy52fxbMRK8qRHZwKi49n9L29bT2NrwEmR6XKNG biZ8/mJr qTEsbbQ1O3T0yNZzU8n4yUe1vhpAB4gINkK6wlJFQLzqtPDvfQL7vZdItb/Z1alzV24K1IOvjz7PnRED4XMbnXxiTwQg5TYFzZawc3p+0rJmTS1DpkXCtArt4mq8S5eZ1B3dPqXookCf1IMEHmCUOH9dg46LvyVZqKYZMP5yEHQYC/voEEV9YW1zFXEIkjhWGx+kEc3X5YIutsP+9snCus76h1MEwaAkyIcagUvx6tSjqExR+e2SOiQdvsKJQQBaiW7V89Oc3/jyF+r2KF3xzR7swDAgnosm4rKk7V0sDKAuoPpALVNYfx7zqcpOxBMx3PpzDkYA//ue6wBULz4lftIHASt0d1Lxy2CVnTIgwT5EjKZFvVKpf5Hi0IrYPhrajxENvdUssD2/Ei/hvZPhAZJsX1tqESosZVouJLbG/fU8w+omG1xDAZgCQL4dKwR8tJ7L+WMtM33541Nyk3cAGmA5MPjy28j9ynpT/7GbZO6qCifv3zOcfolU8HDxEom1s6sH32pJLBxHKKzzpiNjXnPJcD9wN3vWLqXPlwlet+/Ltgqf4ANIsVkqeF2gHzHbG/EvZqgIcUOn5gD4/O5fdwGBJl5g610uxcXFVgAEDXpC7f/Yyz6X/LGD1JpPttsRqszvDIJ0sFOQMhquScFrU/M+bgdXPzcJbI4zqj4yX6T99W143Y+ZHYFfwxwdawdAR/+0aTwM/rbN4MF3l+51JwSwwl0JM/1uK7WWtLq2yJS9F+HSev2p8JRqODQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Hi David! On Wed, 2025-02-26 at 15:30 +0000, David Hildenbrand wrote: > On 26.02.25 16:14, Patrick Roy wrote: >> >> >> On Wed, 2025-02-26 at 09:08 +0000, David Hildenbrand wrote: >>> On 26.02.25 09:48, Patrick Roy wrote: >>>> >>>> >>>> On Tue, 2025-02-25 at 16:54 +0000, David Hildenbrand wrote:> On 21.02.25 17:07, Patrick Roy wrote: >>>>>> Add KVM_GMEM_NO_DIRECT_MAP flag for KVM_CREATE_GUEST_MEMFD() ioctl. When >>>>>> set, guest_memfd folios will be removed from the direct map after >>>>>> preparation, with direct map entries only restored when the folios are >>>>>> freed. >>>>>> >>>>>> To ensure these folios do not end up in places where the kernel cannot >>>>>> deal with them, set AS_NO_DIRECT_MAP on the guest_memfd's struct >>>>>> address_space if KVM_GMEM_NO_DIRECT_MAP is requested. >>>>>> >>>>>> Note that this flag causes removal of direct map entries for all >>>>>> guest_memfd folios independent of whether they are "shared" or "private" >>>>>> (although current guest_memfd only supports either all folios in the >>>>>> "shared" state, or all folios in the "private" state if >>>>>> !IS_ENABLED(CONFIG_KVM_GMEM_SHARED_MEM)). The usecase for removing >>>>>> direct map entries of also the shared parts of guest_memfd are a special >>>>>> type of non-CoCo VM where, host userspace is trusted to have access to >>>>>> all of guest memory, but where Spectre-style transient execution attacks >>>>>> through the host kernel's direct map should still be mitigated. >>>>>> >>>>>> Note that KVM retains access to guest memory via userspace >>>>>> mappings of guest_memfd, which are reflected back into KVM's memslots >>>>>> via userspace_addr. This is needed for things like MMIO emulation on >>>>>> x86_64 to work. Previous iterations attempted to instead have KVM >>>>>> temporarily restore direct map entries whenever such an access to guest >>>>>> memory was needed, but this turned out to have a significant performance >>>>>> impact, as well as additional complexity due to needing to refcount >>>>>> direct map reinsertion operations and making them play nicely with gmem >>>>>> truncations. >>>>>> >>>>>> This iteration also doesn't have KVM perform TLB flushes after direct >>>>>> map manipulations. This is because TLB flushes resulted in a up to 40x >>>>>> elongation of page faults in guest_memfd (scaling with the number of CPU >>>>>> cores), or a 5x elongation of memory population. On the one hand, TLB >>>>>> flushes are not needed for functional correctness (the virt->phys >>>>>> mapping technically stays "correct", the kernel should simply to not it >>>>>> for a while), so this is a correct optimization to make. On the other >>>>>> hand, it means that the desired protection from Spectre-style attacks is >>>>>> not perfect, as an attacker could try to prevent a stale TLB entry from >>>>>> getting evicted, keeping it alive until the page it refers to is used by >>>>>> the guest for some sensitive data, and then targeting it using a >>>>>> spectre-gadget. >>>>>> >>>>>> Signed-off-by: Patrick Roy >>>>> >>>>> ... >>>>> >>>>>> >>>>>> +static bool kvm_gmem_test_no_direct_map(struct inode *inode) >>>>>> +{ >>>>>> + return ((unsigned long) inode->i_private) & KVM_GMEM_NO_DIRECT_MAP; >>>>>> +} >>>>>> + >>>>>> static inline void kvm_gmem_mark_prepared(struct folio *folio) >>>>>> { >>>>>> + struct inode *inode = folio_inode(folio); >>>>>> + >>>>>> + if (kvm_gmem_test_no_direct_map(inode)) { >>>>>> + int r = set_direct_map_valid_noflush(folio_page(folio, 0), folio_nr_pages(folio), >>>>>> + false); >>>>> >>>>> Will this work if KVM is built as a module, or is this another good >>>>> reason why we might want guest_memfd core part of core-mm? >>>> >>>> mh, I'm admittedly not too familiar with the differences that would come >>>> from building KVM as a module vs not. I do remember something about the >>>> direct map accessors not being available for modules, so this would >>>> indeed not work. Does that mean moving gmem into core-mm will be a >>>> pre-requisite for the direct map removal stuff? >>> >>> Likely, we'd need some shim. >>> >>> Maybe for the time being it could be fenced using #if IS_BUILTIN() ... >>> but that sure won't win in a beauty contest. >> >> Is anyone working on such a shim at the moment? Otherwise, would it make >> sense for me to look into it? (although I'll probably need a pointer or >> two for what is actually needed) >> >> I saw your comment on Fuad's series [1] indicating that he'll also need >> some shim, so probably makes sense to tackle it anyway instead of >> hacking around it with #if-ery. > > Elliot (CC) was working on "guestmem library" project [1], but it was > unclear what we could factor out into the core. > > Looks like a simple shim for such stuff might be a good starting point, > although not the final idea of encapsulating more in the library. So I started looking into this based on what we talked about at the last guest_memfd sync. I tried to sort of go the way you hinted at when this topic of "direct map removal from modules" came up in the past [1], and hide it behind some sort of "alloc/free" abstraction. E.g. have the library/shim expose gmem_get_folio(struct inode *inode, pgoff_t index) that is a sorta equivalent of today __kvm_gmem_get_pfn(), which grabs a new folio from the filemap, prepares it via a callback provided by KVM, and then direct map removes it before returning it proper. But then, that could still be "abused" by module code to just remove arbitrary folios from the direct map, if a caller messed up any old struct inode to look sufficiently like a gmem inode for the purposes of gmem_get_folio(). But I also couldn't really come up with anything that _wouldn't_ allow something like this. What're your thoughts on this? Do we need to find a way to prevent this sort of stuff, and is that even possible? I checked some of Elliot's old submissions that contain direct map removal as part of the library and they run into the same problem. Best, Patrick [1]: https://lore.kernel.org/all/49d14780-56f4-478d-9f5f-0857e788c667@redhat.com/ > @Elliot, are you currently still looking into this? > > [1] > https://lore.kernel.org/all/20241113-guestmem-library-v3-0-71fdee85676b@quicinc.com/T/#u > > -- > Cheers, > > David / dhildenb >