From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id A8AD4C47DD9 for ; Fri, 22 Mar 2024 21:21:22 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id DDF896B0092; Fri, 22 Mar 2024 17:21:21 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id D8F936B0093; Fri, 22 Mar 2024 17:21:21 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id BE22D6B0095; Fri, 22 Mar 2024 17:21:21 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id ADC156B0092 for ; Fri, 22 Mar 2024 17:21:21 -0400 (EDT) Received: from smtpin29.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id D83EA1C062B for ; Fri, 22 Mar 2024 21:21:20 +0000 (UTC) X-FDA: 81925945920.29.E2C5124 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf29.hostedemail.com (Postfix) with ESMTP id 8A9CF120014 for ; Fri, 22 Mar 2024 21:21:18 +0000 (UTC) Authentication-Results: imf29.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=E9nxCml5; dmarc=pass (policy=none) header.from=redhat.com; spf=pass (imf29.hostedemail.com: domain of david@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=david@redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1711142478; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=/2NHAkcAhytdyju8HTzX8/9lOwZ6eY2/n5U6+M1doi0=; b=Lk+ypOb/kM3gZZlER/LMRmdBPc4wpUY0URztHeMauFiT2IbWH2etH+ebLBGVKw17MKAoDW mDpyBYC993owDQro+bb8gN+qENF7lzXk/GEF/ajBwF0WFKtXgLhgO4kT0QhkDdOswawmSk B2qwlVg3CwDo6UCnF2rJcmB0+aR/Az4= ARC-Authentication-Results: i=1; imf29.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=E9nxCml5; dmarc=pass (policy=none) header.from=redhat.com; spf=pass (imf29.hostedemail.com: domain of david@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=david@redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1711142478; a=rsa-sha256; cv=none; b=q3LdLL31t6oe6/bOAd/C9UBO+yYenngFVEcWS4rKTtL7TiGx1HyZk1/KprqEm9XcnRl0V5 07vW1Sy4Yo5YHeIyJLX+n1sn3sZOSKk+0K/PDkqu+hJ05t/7UY4Z/s0udU29p/MDcCdiE8 KFodDzP7fM9ljGbzLL5qnv4fK6UR864= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1711142477; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:autocrypt:autocrypt; bh=/2NHAkcAhytdyju8HTzX8/9lOwZ6eY2/n5U6+M1doi0=; b=E9nxCml5dotvYCTvKhOYfBzsEVHZD+6+JaJNYghiXilgLoBJwIDO1tPmzhMtuKG3qBWSqZ XYJIlpss0YYlWOiLBy58hpEYnIUtMYoBis8y/MUh5VsmMHBBp9VL7b75iSnQ6kQdp7lkLN MVBjhDdQk1s1nw39/Kz+aY50j+c3Qp8= Received: from mail-wr1-f70.google.com (mail-wr1-f70.google.com [209.85.221.70]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-270-rPvKOnGxM3a0Deo84x8-Yw-1; Fri, 22 Mar 2024 17:21:14 -0400 X-MC-Unique: rPvKOnGxM3a0Deo84x8-Yw-1 Received: by mail-wr1-f70.google.com with SMTP id ffacd0b85a97d-33d8d208be9so1503779f8f.2 for ; Fri, 22 Mar 2024 14:21:14 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1711142473; x=1711747273; h=content-transfer-encoding:in-reply-to:organization:autocrypt :references:cc:to:from:content-language:subject:user-agent :mime-version:date:message-id:x-gm-message-state:from:to:cc:subject :date:message-id:reply-to; bh=/2NHAkcAhytdyju8HTzX8/9lOwZ6eY2/n5U6+M1doi0=; b=ryQz0bFu2Yv6EOpMD0MLO7aBxLTusP7/bBnzyhgiSK+ZMuq/+CGL4Ya8eeQlWcnQ3c FrLZBoTQbJfJQ/8IKm1wqKGxLyDdCSn+Xxo3iduj9IyVlwE2lC0riy6bsdGstzMvcvDX HUkzNm3Zz/ac6mqFAuqBK8TaKHHkBd64Inj4ptSdKXNsgIDO9Xlc4gjX9f4zF4TPlqTw ng+TKZTmeaSTxrFHjCZjm2CXshxYscweCygHgyOwsuZIiDJzlGNXxLOIbIJl8EDaUyfN URajfQkbLeWSXUjjLQiu5KecqZJ/FH/8NzeslE+2fW/5nJiTLYyJppvhwB5L0sATueKN EGcA== X-Forwarded-Encrypted: i=1; AJvYcCXaIOmAvuZzr14qba7BeYIvet49x7NdazCrv79IHSPmCIQ5uGUIgM9s/84tvRRwQoAmfLW33htV/lXylBC3Aj4/WWM= X-Gm-Message-State: AOJu0YwHWtzZqcKevzJx79qgPcbHmoFUSPE8ubm9nP8Ia6ZDR9mEhXh0 5d9Chxnhp46OkmcpMkMkrIDw+HfuXT+++TCiSY9EtQ1ymIZj6r1n7j2NH2NzNyUugbaGveHIMUB erD4S+8/wtKCgI7Mt0pOZ2wvki45llZFZjWijyEIrGp8nQOku X-Received: by 2002:adf:eccd:0:b0:341:906b:3351 with SMTP id s13-20020adfeccd000000b00341906b3351mr565213wro.0.1711142473359; Fri, 22 Mar 2024 14:21:13 -0700 (PDT) X-Google-Smtp-Source: AGHT+IEaBcj1kdVlkLNKjKedUTQg6bQ687/NvosXDtUD9LYVEMlbn+oUlfZ+Cbt6AQmQivUD+3LR2A== X-Received: by 2002:adf:eccd:0:b0:341:906b:3351 with SMTP id s13-20020adfeccd000000b00341906b3351mr565177wro.0.1711142472816; Fri, 22 Mar 2024 14:21:12 -0700 (PDT) Received: from ?IPV6:2003:cb:c71b:7e00:9339:4017:7111:82d0? (p200300cbc71b7e0093394017711182d0.dip0.t-ipconnect.de. [2003:cb:c71b:7e00:9339:4017:7111:82d0]) by smtp.gmail.com with ESMTPSA id i5-20020a5d5585000000b0033ed7181fd1sm2894975wrv.62.2024.03.22.14.21.10 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Fri, 22 Mar 2024 14:21:12 -0700 (PDT) Message-ID: Date: Fri, 22 Mar 2024 22:21:09 +0100 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: folio_mmapped From: David Hildenbrand To: Will Deacon Cc: Sean Christopherson , Vishal Annapurve , Quentin Perret , Matthew Wilcox , Fuad Tabba , kvm@vger.kernel.org, kvmarm@lists.linux.dev, pbonzini@redhat.com, chenhuacai@kernel.org, mpe@ellerman.id.au, anup@brainfault.org, paul.walmsley@sifive.com, palmer@dabbelt.com, aou@eecs.berkeley.edu, viro@zeniv.linux.org.uk, brauner@kernel.org, akpm@linux-foundation.org, xiaoyao.li@intel.com, yilun.xu@intel.com, chao.p.peng@linux.intel.com, jarkko@kernel.org, amoorthy@google.com, dmatlack@google.com, yu.c.zhang@linux.intel.com, isaku.yamahata@intel.com, mic@digikod.net, vbabka@suse.cz, ackerleytng@google.com, mail@maciej.szmigiero.name, michael.roth@amd.com, wei.w.wang@intel.com, liam.merwick@oracle.com, isaku.yamahata@gmail.com, kirill.shutemov@linux.intel.com, suzuki.poulose@arm.com, steven.price@arm.com, quic_mnalajal@quicinc.com, quic_tsoni@quicinc.com, quic_svaddagi@quicinc.com, quic_cvanscha@quicinc.com, quic_pderrin@quicinc.com, quic_pheragu@quicinc.com, catalin.marinas@arm.com, james.morse@arm.com, yuzenghui@huawei.com, oliver.upton@linux.dev, maz@kernel.org, keirf@google.com, linux-mm@kvack.org References: <7470390a-5a97-475d-aaad-0f6dfb3d26ea@redhat.com> <40f82a61-39b0-4dda-ac32-a7b5da2a31e8@redhat.com> <20240319143119.GA2736@willie-the-truck> <2d6fc3c0-a55b-4316-90b8-deabb065d007@redhat.com> Autocrypt: addr=david@redhat.com; keydata= xsFNBFXLn5EBEAC+zYvAFJxCBY9Tr1xZgcESmxVNI/0ffzE/ZQOiHJl6mGkmA1R7/uUpiCjJ dBrn+lhhOYjjNefFQou6478faXE6o2AhmebqT4KiQoUQFV4R7y1KMEKoSyy8hQaK1umALTdL QZLQMzNE74ap+GDK0wnacPQFpcG1AE9RMq3aeErY5tujekBS32jfC/7AnH7I0v1v1TbbK3Gp XNeiN4QroO+5qaSr0ID2sz5jtBLRb15RMre27E1ImpaIv2Jw8NJgW0k/D1RyKCwaTsgRdwuK Kx/Y91XuSBdz0uOyU/S8kM1+ag0wvsGlpBVxRR/xw/E8M7TEwuCZQArqqTCmkG6HGcXFT0V9 PXFNNgV5jXMQRwU0O/ztJIQqsE5LsUomE//bLwzj9IVsaQpKDqW6TAPjcdBDPLHvriq7kGjt WhVhdl0qEYB8lkBEU7V2Yb+SYhmhpDrti9Fq1EsmhiHSkxJcGREoMK/63r9WLZYI3+4W2rAc UucZa4OT27U5ZISjNg3Ev0rxU5UH2/pT4wJCfxwocmqaRr6UYmrtZmND89X0KigoFD/XSeVv jwBRNjPAubK9/k5NoRrYqztM9W6sJqrH8+UWZ1Idd/DdmogJh0gNC0+N42Za9yBRURfIdKSb B3JfpUqcWwE7vUaYrHG1nw54pLUoPG6sAA7Mehl3nd4pZUALHwARAQABzSREYXZpZCBIaWxk ZW5icmFuZCA8ZGF2aWRAcmVkaGF0LmNvbT7CwZgEEwEIAEICGwMGCwkIBwMCBhUIAgkKCwQW AgMBAh4BAheAAhkBFiEEG9nKrXNcTDpGDfzKTd4Q9wD/g1oFAl8Ox4kFCRKpKXgACgkQTd4Q 9wD/g1oHcA//a6Tj7SBNjFNM1iNhWUo1lxAja0lpSodSnB2g4FCZ4R61SBR4l/psBL73xktp rDHrx4aSpwkRP6Epu6mLvhlfjmkRG4OynJ5HG1gfv7RJJfnUdUM1z5kdS8JBrOhMJS2c/gPf wv1TGRq2XdMPnfY2o0CxRqpcLkx4vBODvJGl2mQyJF/gPepdDfcT8/PY9BJ7FL6Hrq1gnAo4 3Iv9qV0JiT2wmZciNyYQhmA1V6dyTRiQ4YAc31zOo2IM+xisPzeSHgw3ONY/XhYvfZ9r7W1l pNQdc2G+o4Di9NPFHQQhDw3YTRR1opJaTlRDzxYxzU6ZnUUBghxt9cwUWTpfCktkMZiPSDGd KgQBjnweV2jw9UOTxjb4LXqDjmSNkjDdQUOU69jGMUXgihvo4zhYcMX8F5gWdRtMR7DzW/YE BgVcyxNkMIXoY1aYj6npHYiNQesQlqjU6azjbH70/SXKM5tNRplgW8TNprMDuntdvV9wNkFs 9TyM02V5aWxFfI42+aivc4KEw69SE9KXwC7FSf5wXzuTot97N9Phj/Z3+jx443jo2NR34XgF 89cct7wJMjOF7bBefo0fPPZQuIma0Zym71cP61OP/i11ahNye6HGKfxGCOcs5wW9kRQEk8P9 M/k2wt3mt/fCQnuP/mWutNPt95w9wSsUyATLmtNrwccz63XOwU0EVcufkQEQAOfX3n0g0fZz Bgm/S2zF/kxQKCEKP8ID+Vz8sy2GpDvveBq4H2Y34XWsT1zLJdvqPI4af4ZSMxuerWjXbVWb T6d4odQIG0fKx4F8NccDqbgHeZRNajXeeJ3R7gAzvWvQNLz4piHrO/B4tf8svmRBL0ZB5P5A 2uhdwLU3NZuK22zpNn4is87BPWF8HhY0L5fafgDMOqnf4guJVJPYNPhUFzXUbPqOKOkL8ojk CXxkOFHAbjstSK5Ca3fKquY3rdX3DNo+EL7FvAiw1mUtS+5GeYE+RMnDCsVFm/C7kY8c2d0G NWkB9pJM5+mnIoFNxy7YBcldYATVeOHoY4LyaUWNnAvFYWp08dHWfZo9WCiJMuTfgtH9tc75 7QanMVdPt6fDK8UUXIBLQ2TWr/sQKE9xtFuEmoQGlE1l6bGaDnnMLcYu+Asp3kDT0w4zYGsx 5r6XQVRH4+5N6eHZiaeYtFOujp5n+pjBaQK7wUUjDilPQ5QMzIuCL4YjVoylWiBNknvQWBXS lQCWmavOT9sttGQXdPCC5ynI+1ymZC1ORZKANLnRAb0NH/UCzcsstw2TAkFnMEbo9Zu9w7Kv AxBQXWeXhJI9XQssfrf4Gusdqx8nPEpfOqCtbbwJMATbHyqLt7/oz/5deGuwxgb65pWIzufa N7eop7uh+6bezi+rugUI+w6DABEBAAHCwXwEGAEIACYCGwwWIQQb2cqtc1xMOkYN/MpN3hD3 AP+DWgUCXw7HsgUJEqkpoQAKCRBN3hD3AP+DWrrpD/4qS3dyVRxDcDHIlmguXjC1Q5tZTwNB boaBTPHSy/Nksu0eY7x6HfQJ3xajVH32Ms6t1trDQmPx2iP5+7iDsb7OKAb5eOS8h+BEBDeq 3ecsQDv0fFJOA9ag5O3LLNk+3x3q7e0uo06XMaY7UHS341ozXUUI7wC7iKfoUTv03iO9El5f XpNMx/YrIMduZ2+nd9Di7o5+KIwlb2mAB9sTNHdMrXesX8eBL6T9b+MZJk+mZuPxKNVfEQMQ a5SxUEADIPQTPNvBewdeI80yeOCrN+Zzwy/Mrx9EPeu59Y5vSJOx/z6OUImD/GhX7Xvkt3kq Er5KTrJz3++B6SH9pum9PuoE/k+nntJkNMmQpR4MCBaV/J9gIOPGodDKnjdng+mXliF3Ptu6 3oxc2RCyGzTlxyMwuc2U5Q7KtUNTdDe8T0uE+9b8BLMVQDDfJjqY0VVqSUwImzTDLX9S4g/8 kC4HRcclk8hpyhY2jKGluZO0awwTIMgVEzmTyBphDg/Gx7dZU1Xf8HFuE+UZ5UDHDTnwgv7E th6RC9+WrhDNspZ9fJjKWRbveQgUFCpe1sa77LAw+XFrKmBHXp9ZVIe90RMe2tRL06BGiRZr jPrnvUsUUsjRoRNJjKKA/REq+sAnhkNPPZ/NNMjaZ5b8Tovi8C0tmxiCHaQYqj7G2rgnT0kt WNyWQQ== Organization: Red Hat In-Reply-To: <2d6fc3c0-a55b-4316-90b8-deabb065d007@redhat.com> X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Language: en-US Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Rspam-User: X-Rspamd-Server: rspam12 X-Rspamd-Queue-Id: 8A9CF120014 X-Stat-Signature: nsihc19dhhbchhpnsqswsdk9fb1n9ijf X-HE-Tag: 1711142478-476478 X-HE-Meta: U2FsdGVkX1+4wImVLn9+geF8YqyI6UIBPRG5mI5U8+EK4JTd+ljV+viZPc4ve+zpGGjhtJ0in8h6pjCK8HsA8SPNHl+xyeDqldauYLsg6Yxg/RBJPH4pIK47nYVUGp6sTtmoUuNZxpM+MuFnNLJimmCM9PYBrH8NHYxgIHFOK4gD6CxXwduEfkDLW78Xh6PgcFMWdwsw2+FA6GIcE5r+auqirQoZs2hAPdZ3mQF0NksNqbo8lTE0GcJPFWUTLGJ6L7BxQvxMTxRsPbjBaizHEwczAK1QbJT3m+CbC/v0/4WzYPJObeVjXsRR9TIFWvR/Rti9DljYnV+dCPOcpXa/fX4YufM9UCQuQLiygrRSnolfY598omDW542JeiF0Kogf7tw02F3A6rRfl+aRex0s0uu3X9b37UQdQ4JmitiKmKOdi9VuzRpWhVw5vpakDClICF46rrJXJ5TqJLtLfr7efU5GTpHE07ZkGJEQ2cb1kaZ/Zptr9htnNOcJ5hrrxcQX5MNc19TsyeOBDd7Z/ooWicsOPCiQ2at5lfTHgYbrZFgTbHLueeP6Dkyjqdf8xY3DiRWoVz5+FthNS3MTz61zfS5AzHwtoozS5G8EArEgFQ0WsMPWwTkpPj3cJLel0+JzWOckJz77uRRSACDHT3L7E41B/Jhrhy9P4dz4gyS/TxD0/kKFhfr5kWy1Hh8W4KJgSPEH8n3aKRypcX+FlxPGhRLoO4tMutdL9mVqncEcYeR4jiXcuDmuswXdgFnV1eeRc5MiMhOshZ+nxY8e1PYBlJTqQZCSKO8GP3jtHxQ/paEB56bv+sGlLEvrTX/w17ps4YZ14gRAgSoo1zJeilA/vDJOvJca+6y5aKxl3wlQQyIKcLGpEixy8F9aF8maiOp78YOK38S20zbjg+iPnZlQDlUcNqZr1g0DxY7LPxuN9GNPHiVivg1MlOj74XlU4LWLGw5PVjHRUuEqEv3rLdj 6scIRLvy 7+Fe8WD9mYGzVq7w7LYhqf8/rfYdVAP/awIyQ+mUUKn5w66cvO6NqjJrIIFsbkZcxNBvS9a4smofMAwBqx7MAHqsBJT7u0OOMUdF4NLGzfXajnawUKMjsT9bIjJVUiI1hJONw7tIzbpYzfakgv5jIK1p6u/UUviJci7FvgtV32qXJ1yIngylaXuEXNKPIw/TVrSIijI7sABK2w6atSbPuIuknFTq+NZPN+I9Or2FavT8+ADsLIoPDx9qbAYJIj8rmg07EjR6KHxTXTfizZky38BiO66C3bCjDHuBblGAJTSP68+N9TaZWHKP6ma/T2CBEeUu17UtF5yuouViFg8UUXF1zRFFnRJonC29XkPy2zzKv/lLy9xe3zKDCFDVhMji6D9Q9gEUR5RAKhGW4DxlRrhSTWm+IpoYBShX+5/ZLTjlrjKw8raBwc+cKIw4LNnWQ1kREZF8E9gdktTSSBYlCogM9sB3x3LpOCkMUuTTxGRvpKf+0kGfsQ81yiA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 22.03.24 18:52, David Hildenbrand wrote: > On 19.03.24 15:31, Will Deacon wrote: >> Hi David, > > Hi Will, > > sorry for the late reply! > >> >> On Tue, Mar 19, 2024 at 11:26:05AM +0100, David Hildenbrand wrote: >>> On 19.03.24 01:10, Sean Christopherson wrote: >>>> On Mon, Mar 18, 2024, Vishal Annapurve wrote: >>>>> On Mon, Mar 18, 2024 at 3:02 PM David Hildenbrand wrote: >>>>>> Second, we should find better ways to let an IOMMU map these pages, >>>>>> *not* using GUP. There were already discussions on providing a similar >>>>>> fd+offset-style interface instead. GUP really sounds like the wrong >>>>>> approach here. Maybe we should look into passing not only guest_memfd, >>>>>> but also "ordinary" memfds. >>>> >>>> +1. I am not completely opposed to letting SNP and TDX effectively convert >>>> pages between private and shared, but I also completely agree that letting >>>> anything gup() guest_memfd memory is likely to end in tears. >>> >>> Yes. Avoid it right from the start, if possible. >>> >>> People wanted guest_memfd to *not* have to mmap guest memory ("even for >>> ordinary VMs"). Now people are saying we have to be able to mmap it in order >>> to GUP it. It's getting tiring, really. >> >> From the pKVM side, we're working on guest_memfd primarily to avoid >> diverging from what other CoCo solutions end up using, but if it gets >> de-featured (e.g. no huge pages, no GUP, no mmap) compared to what we do >> today with anonymous memory, then it's a really hard sell to switch over >> from what we have in production. We're also hoping that, over time, >> guest_memfd will become more closely integrated with the mm subsystem to >> enable things like hypervisor-assisted page migration, which we would >> love to have. > > Reading Sean's reply, he has a different view on that. And I think > that's the main issue: there are too many different use cases and too > many different requirements that could turn guest_memfd into something > that maybe it really shouldn't be. > >> >> Today, we use the existing KVM interfaces (i.e. based on anonymous >> memory) and it mostly works with the one significant exception that >> accessing private memory via a GUP pin will crash the host kernel. If >> all guest_memfd() can offer to solve that problem is preventing GUP >> altogether, then I'd sooner just add that same restriction to what we >> currently have instead of overhauling the user ABI in favour of >> something which offers us very little in return. >> >> On the mmap() side of things for guest_memfd, a simpler option for us >> than what has currently been proposed might be to enforce that the VMM >> has unmapped all private pages on vCPU run, failing the ioctl if that's >> not the case. It needs a little more tracking in guest_memfd but I think >> GUP will then fall out in the wash because only shared pages will be >> mapped by userspace and so GUP will fail by construction for private >> pages. >> >> We're happy to pursue alternative approaches using anonymous memory if >> you'd prefer to keep guest_memfd limited in functionality (e.g. >> preventing GUP of private pages by extending mapping_flags as per [1]), >> but we're equally willing to contribute to guest_memfd if extensions are >> welcome. >> >> What do you prefer? > > Let me summarize the history: > > AMD had its thing running and it worked for them (but I recall it was > hacky :) ). > > TDX made it possible to crash the machine when accessing secure memory > from user space (MCE). > > So secure memory must not be mapped into user space -- no page tables. > Prototypes with anonymous memory existed (and I didn't hate them, > although hacky), but one of the other selling points of guest_memfd was > that we could create VMs that wouldn't need any page tables at all, > which I found interesting. > > There was a bit more to that (easier conversion, avoiding GUP, > specifying on allocation that the memory was unmovable ...), but I'll > get to that later. > > The design principle was: nasty private memory (unmovable, unswappable, > inaccessible, un-GUPable) is allocated from guest_memfd, ordinary > "shared" memory is allocated from an ordinary memfd. > > This makes sense: shared memory is neither nasty nor special. You can > migrate it, swap it out, map it into page tables, GUP it, ... without > any issues. > > > So if I would describe some key characteristics of guest_memfd as of > today, it would probably be: > > 1) Memory is unmovable and unswappable. Right from the beginning, it is > allocated as unmovable (e.g., not placed on ZONE_MOVABLE, CMA, ...). > 2) Memory is inaccessible. It cannot be read from user space, the > kernel, it cannot be GUP'ed ... only some mechanisms might end up > touching that memory (e.g., hibernation, /proc/kcore) might end up > touching it "by accident", and we usually can handle these cases. > 3) Memory can be discarded in page granularity. There should be no cases > where you cannot discard memory to over-allocate memory for private > pages that have been replaced by shared pages otherwise. > 4) Page tables are not required (well, it's an memfd), and the fd could > in theory be passed to other processes. > > Having "ordinary shared" memory in there implies that 1) and 2) will > have to be adjusted for them, which kind-of turns it "partially" into > ordinary shmem again. > > > Going back to the beginning: with pKVM, we likely want the following > > 1) Convert pages private<->shared in-place > 2) Stop user space + kernel from accessing private memory in process > context. Likely for pKVM we would only crash the process, which > would be acceptable. > 3) Prevent GUP to private memory. Otherwise we could crash the kernel. > 4) Prevent private pages from swapout+migration until supported. > > > I suspect your current solution with anonymous memory gets all but 3) > sorted out, correct? > > I'm curious, may there be a requirement in the future that shared memory > could be mapped into other processes? (thinking vhost-user and such > things). Of course that's impossible with anonymous memory; teaching > shmem to contain private memory would kind-of lead to ... guest_memfd, > just that we don't have shared memory there. > I was just thinking of something stupid, not sure if it makes any sense. I'll raise it here before I forget over the weekend. ... what if we glued one guest_memfd and a memfd (shmem) together in the kernel somehow? (1) A to-shared conversion moves a page from the guest_memfd to the memfd. (2) A to-private conversion moves a page from the memfd to the guest_memfd. Only the memfd can be mmap'ed/read/written/GUP'ed. Pages in the memfd behave like any shmem pages: migratable, swappable etc. Of course, (2) is only possible if the page is not pinned, not mapped (we can unmap it). AND, the page must not reside on ZONE_MOVABLE / MIGRATE_CMA. We'd have to decide what to do when we access a "hole" in the memfd -- instead of allocating a fresh page and filling the hole, we'd want to SIGBUS. -- Cheers, David / dhildenb