From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 57C77CD1283 for ; Fri, 29 Mar 2024 18:39:07 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id D9BED6B007B; Fri, 29 Mar 2024 14:39:06 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id D250D6B0082; Fri, 29 Mar 2024 14:39:06 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B9EB06B0085; Fri, 29 Mar 2024 14:39:06 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 9E73D6B007B for ; Fri, 29 Mar 2024 14:39:06 -0400 (EDT) Received: from smtpin13.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 30E7C1405B2 for ; Fri, 29 Mar 2024 18:39:06 +0000 (UTC) X-FDA: 81950938692.13.897CCE2 Received: from mail-qv1-f44.google.com (mail-qv1-f44.google.com [209.85.219.44]) by imf17.hostedemail.com (Postfix) with ESMTP id 6F1424000B for ; Fri, 29 Mar 2024 18:39:04 +0000 (UTC) Authentication-Results: imf17.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=Ko+jpAni; spf=pass (imf17.hostedemail.com: domain of vannapurve@google.com designates 209.85.219.44 as permitted sender) smtp.mailfrom=vannapurve@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1711737544; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=YfjikohpzOAjLlRqO+6UBB6wPr/KvkMvVqtQCBtLFjM=; b=icz2R9tjE2ObBugTrI1t6FBEMVnO4p/Y7r1H9TPB/JIa/k7P03Gvpzfb3nBu/ke/EN9y0c aN+BBn8NM8i0tM5jzmFozCjzaJmQCr3a2GWb/bBNbH93ZZh9nmMfUkYoo9bDgbkXWbsXsq P4xu1W1Bnw/NB0kP+yc6qdvsb8+jUcM= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1711737544; a=rsa-sha256; cv=none; b=wu7F5hxtH3rknVva3qefDFrxC6FNG5oebcq4FqqEEJ4L0GvPfx6IwoOmh+Ssqpbv7QAP87 mDvN+lIw2rnC5H/9Vvvr8BUGqqoABPpAdxOfhFUie/QOw+INpe9UunJ5BSMBSxFOrBCcaw Ks8XjviqyNDCLDu2E1S+BS4O3La3x+s= ARC-Authentication-Results: i=1; imf17.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=Ko+jpAni; spf=pass (imf17.hostedemail.com: domain of vannapurve@google.com designates 209.85.219.44 as permitted sender) smtp.mailfrom=vannapurve@google.com; dmarc=pass (policy=reject) header.from=google.com Received: by mail-qv1-f44.google.com with SMTP id 6a1803df08f44-69185f093f5so14523916d6.3 for ; Fri, 29 Mar 2024 11:39:04 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1711737543; x=1712342343; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=YfjikohpzOAjLlRqO+6UBB6wPr/KvkMvVqtQCBtLFjM=; b=Ko+jpAniIXRU4GEoXizX22zQEZ/GqXzDQ0lP+sF5vO3cYcapchhwWNk6QKASnmqmVr i0x6BdRCYvYWcwn5s68iVI4aLoaqZynrTg8E5v0/5wDwt8W46LObdrWnXzDQa9xhpvdc WNjaEuSYR/g7R2O66EUaFwCzqsEFQ4wZttjry5lDo/1xDpI7jNhOduzlwAxEKN3iygBW kCnY7LdWgMRyPw0vOaIXcPWZ50w7C8NUn57Us1N8CksievztBb2+O+YlTZr2KE6HilLH AFkGiZlPMlfxJ0/h1YGhpuKcfblPUScVtDlcMmO9SjZ0/4I8szavEdaOjBcyjF8M4mlw jpNw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1711737543; x=1712342343; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=YfjikohpzOAjLlRqO+6UBB6wPr/KvkMvVqtQCBtLFjM=; b=IKlnt3w3S4t7V5o6fkXXit7guWmBJjX3csw4Klg0ljZMR1k5ZpjInqnVPp+KuKWpIP DR+0Xs4ovyq4fPM137gno/vky8cV7XCvERWRREJ8sihJTZgOf/QbXeyMoPLEPkiM2nY4 OtCTSFnbs3yxTmsJ7orQYyTwp79oqO00sRMatACG3vC5hpFdIgbWI3/G/GzcoJ1Dqnsl 181smRbCSxEPikQEwf3UBlrp8nY/1MomCd7y66YwoJqw/iLYSLpRbOVwnWheyA1KVkVr Pd7kgTgyJ8NHTiex13rVxHgZJUzi4BxU0zTD+iHGm+k5lh9OwKef4vDpW1ilLf2MT5tW 8Pew== X-Forwarded-Encrypted: i=1; AJvYcCVvIfYcFWwXAALgeWM16c0v+Raw7MtzHOw9LPpnjMuliIUde9mJw7XZj2tCLS5gi4hklopLfTppDEJy0DkhjSQ9VKg= X-Gm-Message-State: AOJu0Yy/u8NBQj7WheKEPAyGRi4Sx5O6+ZQxZz9yTXWYXgxQue38C4EG rPHkmsXJ4D3xyECBfEEpBQQ6Pdil0MLFZxyMsWCn2ILef3pZACnZzuk/j1hqqOdHONt0QRuKRG1 yIwvSvPq0Aw19kGeWq0ghaaiFQxUPWmdw7uEs X-Google-Smtp-Source: AGHT+IEGZkO07vUNWHTkATHpD6cSAfR3qyRiodjl+lF3pDCMbKLcZEvPgUlbFDmsNYBTPFujyi00g9r2gwkWpfIbMD0= X-Received: by 2002:a0c:f547:0:b0:696:2e0c:8b82 with SMTP id p7-20020a0cf547000000b006962e0c8b82mr3021116qvm.15.1711737543030; Fri, 29 Mar 2024 11:39:03 -0700 (PDT) MIME-Version: 1.0 References: <7470390a-5a97-475d-aaad-0f6dfb3d26ea@redhat.com> <40f82a61-39b0-4dda-ac32-a7b5da2a31e8@redhat.com> <20240319143119.GA2736@willie-the-truck> <2d6fc3c0-a55b-4316-90b8-deabb065d007@redhat.com> <20240327193454.GB11880@willie-the-truck> <5cec1f98-17a5-4120-bbf4-b487c2caf92c@redhat.com> <3448a9d6-58a8-475f-aff6-a39a62eee8c1@redhat.com> In-Reply-To: <3448a9d6-58a8-475f-aff6-a39a62eee8c1@redhat.com> From: Vishal Annapurve Date: Fri, 29 Mar 2024 11:38:49 -0700 Message-ID: Subject: Re: folio_mmapped To: David Hildenbrand Cc: Quentin Perret , Will Deacon , Sean Christopherson , Matthew Wilcox , Fuad Tabba , kvm@vger.kernel.org, kvmarm@lists.linux.dev, pbonzini@redhat.com, chenhuacai@kernel.org, mpe@ellerman.id.au, anup@brainfault.org, paul.walmsley@sifive.com, palmer@dabbelt.com, aou@eecs.berkeley.edu, viro@zeniv.linux.org.uk, brauner@kernel.org, akpm@linux-foundation.org, xiaoyao.li@intel.com, yilun.xu@intel.com, chao.p.peng@linux.intel.com, jarkko@kernel.org, amoorthy@google.com, dmatlack@google.com, yu.c.zhang@linux.intel.com, isaku.yamahata@intel.com, mic@digikod.net, vbabka@suse.cz, ackerleytng@google.com, mail@maciej.szmigiero.name, michael.roth@amd.com, wei.w.wang@intel.com, liam.merwick@oracle.com, isaku.yamahata@gmail.com, kirill.shutemov@linux.intel.com, suzuki.poulose@arm.com, steven.price@arm.com, quic_mnalajal@quicinc.com, quic_tsoni@quicinc.com, quic_svaddagi@quicinc.com, quic_cvanscha@quicinc.com, quic_pderrin@quicinc.com, quic_pheragu@quicinc.com, catalin.marinas@arm.com, james.morse@arm.com, yuzenghui@huawei.com, oliver.upton@linux.dev, maz@kernel.org, keirf@google.com, linux-mm@kvack.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: 6F1424000B X-Rspam-User: X-Stat-Signature: so6mbqzmcbm8x1e1j5i6bzfr11poqa1g X-Rspamd-Server: rspam03 X-HE-Tag: 1711737544-612883 X-HE-Meta: U2FsdGVkX19cPXgrMIPzkjzEGyxADhAdwo0aRx1TggZZygc7aQiMg6XRaLFfrymJS6+cz0aMYRBXA5LthdC4zrtBVkSoIh/NP0mkAcbDEfqnxPfq4fKRhagiBILQ0chcpH/9EIWL8erGOvvnk1fQH6BX/GdVh9IXFJVBP7y7286kjQhwVOceqjGZEsiVbfqyE/hiYQcLJHxUBfdOpIwkIcW2Wh2jXIdwbR3aeFMUgVx8li4eIid5jBxaAY6zqfIN76vwZCsMKsT5HRYMEvalwi5+H6jkFgHS8VfPG8AruzQiT76n1tUjGlJdqgqi/xcwm2R1jt8c1W38SXdNSEg2eWYJnaHSl2Ux/BohutcPe+FYWl0bKELOORJe7zRYypOFNVJhJoP7WXizinFAP5OBPKpwynDPpb5g939dg7uvTiR6C8CjVAhdsFj9UgWJmDvMQzT9cK+wQbSo5EKAy7hYeJ3+rcMDnA1LawLTJJHgi6M3+4pg2kTA3YL/D+LVZvwgzZixQycSgFvXCC5u7J33x6HGvjNxDhfuzg/kZnBLfSWqKXcfU/sfBV1KB+zPcD9h6IfjDzsAdzwXbsvE/Xuc/fXCJUGiL2lsISoX5zpKpRyTeG77DImjO27RDoEZs0D5GrDGx+jWCpQT1gbsxX2Tt/re960dg6hSbiotD1o23Gd3muUrT0aU5XDVvhoGOgNO8r+bnmtZ1tPmQiJ84xU91LdF6Oa2O1BiwWV1H4aSEFSyB8kyIZ7ZVBIcn/s4iMvra3W95yC+0JMIvr/IFdOvsDsFYh8zGW+9JE9jzSlSAKWggOxxB777v4SElmWH7ve/Ekq0hqN83z5TLpDXEUOrGxhFw8k50dkHDC718WIDF2V2p7+hpLRAewgCdIT1DTNcvk1EwWF+4GjKns/Qz23ACulI0rwFTjCDZT2dWTUM78IPWflvgc4zYnrudYZ93nPV5INxViM35THE8eNxXAG dW8Lt8F+ s5XQO/+8ZG0Bk9++LWbKhGmg+5VIWsZmY6AN1xrDOo4b3MJSec9u0iyP6WFWOGDrTCx1L5Xjt/jSLB/YUnRTgELL9yTDHSktAdEsXBdPbFMI1pJTplIwTPv+PfuVbY6WerPhHO0TmqZL7k1Otl8QQpvnQQ8jXBe5mRtWbaRCdVnmvYSW5NPGGv2J8vWnJtDxlzJjO/ZM0lHXRx67xvXMi3J4WP209DZtwjiwNCIy74SpgLouJhG96t16Vy2QOajQ429ku2ltTDQBsznk4SOpV+zqiLFPLorxQz89yfgfrXDtPv6PRTRd53mron/a2J5VN7oZjTvAzjnIKmvwkA+LLpOzXbuQjwoYO2BGKbQ53PCv+gXxT3kVNmWbkiQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Thu, Mar 28, 2024 at 4:41=E2=80=AFAM David Hildenbrand wrote: > > .... > > > >> The whole reason I brought up the guest_memfd+memfd pair idea is that = you > >> would similarly be able to do the conversion in the kernel, BUT, you'd= never > >> be able to mmap+GUP encrypted pages. > >> > >> Essentially you're using guest_memfd for what it was designed for: pri= vate > >> memory that is inaccessible. > > > > Ack, that sounds pretty reasonable to me. But I think we'd still want t= o > > make sure the other users of guest_memfd have the _desire_ to support > > huge pages, migration, swap (probably longer term), and related > > features, otherwise I don't think a guest_memfd-based option will > > really work for us :-) > > *Probably* some easy way to get hugetlb pages into a guest_memfd would > be by allocating them for an memfd and then converting/moving them into > the guest_memfd part of the "fd pair" on conversion to private :) > > (but the "partial shared, partial private" case is and remains the ugly > thing that is hard and I still don't think it makes sense. Maybe it > could be handles somehow in such a dual approach with some enlightment > in the fds ... hard to find solutions for things that don't make any > sense :P ) > I would again emphasize that this usecase exists for Confidential VMs, whether we like it or not. 1) TDX hardware allows usage of 1G pages to back guest memory. 2) Larger VM sizes benefit more with 1G page sizes, which would be a norm with VMs exposing GPU/TPU devices. 3) Confidential VMs will need to share host resources with non-confidential VMs using 1G pages. 4) When using normal shmem/hugetlbfs files to back guest memory, this usecase was achievable by just manipulating guest page tables (although at the cost of host safety which led to invention of guest memfd). Something equivalent "might be possible" with guest memfd. Without handling "partial shared, partial private", it is impractical to support 1G pages for Confidential VMs (discounting any long term efforts to tame the guest VMs to play nice). Maybe to handle this usecase, all the host side shared memory usage of guest memfd (userspace, IOMMU etc) should be associated with (or tracked via) file ranges rather than offsets within huge pages (like it's done for faulting in private memory pages when populating guest EPTs/NPTs). Given the current guest behavior, host MMU and IOMMU may have to be forced to map shared memory regions always via 4KB mappings. > I also do strongly believe that we want to see some HW-assisted > migration support for guest_memfd pages. Swap, as you say, maybe in the > long-term. After all, we're not interested in having MM features for > backing memory that you could similarly find under Windows 95. Wait, > that one did support swapping! :P > > But unfortunately, that's what the shiny new CoCo world currently > offers. Well, excluding s390x secure execution, as discussed. > > -- > Cheers, > > David / dhildenb >