From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id B495AC02182 for ; Wed, 22 Jan 2025 14:25:54 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 35CC96B0082; Wed, 22 Jan 2025 09:25:54 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 2E5656B0083; Wed, 22 Jan 2025 09:25:54 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 15F9F6B0085; Wed, 22 Jan 2025 09:25:54 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id E79606B0082 for ; Wed, 22 Jan 2025 09:25:53 -0500 (EST) Received: from smtpin23.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 61A5844909 for ; Wed, 22 Jan 2025 14:25:53 +0000 (UTC) X-FDA: 83035311786.23.D13C1A6 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf16.hostedemail.com (Postfix) with ESMTP id A6277180003 for ; Wed, 22 Jan 2025 14:25:50 +0000 (UTC) Authentication-Results: imf16.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=dAvsISGH; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=fv7NyEaV; dmarc=pass (policy=none) header.from=redhat.com; spf=pass (imf16.hostedemail.com: domain of david@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=david@redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1737555950; a=rsa-sha256; cv=none; b=7u5oHuiNV+AHd04l2KKdhrxISh5pzinTO1ekB/FF5+kzLRUyKl2pXMF85zPFmKdV4yy8iP mkvU/aQgL9ctfpsfFvgAXOZ3lHaVC3dWQK/lF5PbIDCyhIkkHjWDMYRvUojuA7g3KOgNpb s+oqIOssdHPqRHeK3r0fMi2B1m6z5JQ= ARC-Authentication-Results: i=1; imf16.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=dAvsISGH; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=fv7NyEaV; dmarc=pass (policy=none) header.from=redhat.com; spf=pass (imf16.hostedemail.com: domain of david@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=david@redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1737555950; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=VovBY8C7ZX9LJhRPptigwsin2oTqYRePd/bJwWqtKt4=; b=EpGnSipQG9Ynx4/ny/zWjV5ITKM52EDAwE7r1G9rJuJMBLzNKReuAD4tZI2Bw7aYe44AzI FGfnKJG+UsbwCpZec4veUUw5pM7HjbRktQ+MjHQDlk+mDNtan2iZt+QmppfO+n+9OMcYSV TYj7XDvFJq+UYpEflRS6LV664O6J2pM= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1737555935; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:autocrypt:autocrypt; bh=VovBY8C7ZX9LJhRPptigwsin2oTqYRePd/bJwWqtKt4=; b=dAvsISGHlvCPxEF02H5ahGWL7iVIadYuUZKOPuiXNbQAb/12ESESWt43cHifbNNLcjRLB2 EbJ567Pni1CxTlcepc4aTjOGgd4JydNjdImRpE5N8+CR3IlVOJF33nSCSt64aBc/CwMu+V uTjZTDPV2ccOnqCdgQceVR1BoEUlWCg= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1737555950; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:autocrypt:autocrypt; bh=VovBY8C7ZX9LJhRPptigwsin2oTqYRePd/bJwWqtKt4=; b=fv7NyEaVw+x+zKjM0xBDxQURooxI2244s4/Af6orEHzwL8/IKO4Me0qbUi2pXPg6uUsry5 FtE21VgMhT1ao1V4aIr8p4a+YeH/Xam1UiSdisUQD807zmo8PF7NAD55t1YAReXNbZAWpz l+9GEFYFziY20TqSTt0wrqntQWRa+cQ= Received: from mail-wm1-f71.google.com (mail-wm1-f71.google.com [209.85.128.71]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-136-aMbDPiCBMvOytJgFcXqP_Q-1; Wed, 22 Jan 2025 09:25:33 -0500 X-MC-Unique: aMbDPiCBMvOytJgFcXqP_Q-1 X-Mimecast-MFC-AGG-ID: aMbDPiCBMvOytJgFcXqP_Q Received: by mail-wm1-f71.google.com with SMTP id 5b1f17b1804b1-4361c040ba8so38518155e9.1 for ; Wed, 22 Jan 2025 06:25:33 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1737555932; x=1738160732; h=content-transfer-encoding:in-reply-to:organization:autocrypt :content-language:from:references:cc:to:subject:user-agent :mime-version:date:message-id:x-gm-message-state:from:to:cc:subject :date:message-id:reply-to; bh=VovBY8C7ZX9LJhRPptigwsin2oTqYRePd/bJwWqtKt4=; b=hIzDu6TfMktHXnHy3gzeFD84Y3PyjppQbT15bHZlvxMHLEbHUdtVh7O3i4HX/6YjgK 759Aiv7P0EO6VOKb2dVdTFAPpGS6kFvthRMv0XbKgNuXmPpObARSU8di7mLijgg9Rzly OOZDK8hbaBrIzJftfI4hvVI9glZYP/gmQEPMxbEfiO4Xq6eJnnODqKizXb15BJsBMw5k l9Fq6II+MSlHPyEbxCNZoHV69+P2QYytirHwitBSpB9aqEHqN+izXimQKtp9vwdUqYUu QUWUw93oJywcFWoJXtZJF+/OXh9ePDC8aSV2+RQruugNwsuDCeYDwXOSWBO43fRfeuLP Jxxg== X-Forwarded-Encrypted: i=1; AJvYcCUqL8fObOFrpVSrjvsQ5RP+DemoYpA89ED/stm0dHR9FPTvcmsO1588PyLGLrtmm8XDr0axxriaew==@kvack.org X-Gm-Message-State: AOJu0Yz8BHHxL4qNlClO9bAaPTrOCVl9lXp5mPQoWOCt6LeZDEcz4eD9 zQRehPlyhmOHZBuYTuw7nVPA1zgJ9KnXZgagRznl/AJ4FlwvPhw8iJ7Vhv7swlSEdOZ44ehkCgW 2ZVSRbd43h52FA2I22WKTYrCo1BUZzsZ5JlGYs986G9fLbOmP X-Gm-Gg: ASbGncsReCdtrr69+Uz3i/hRUbvhYdxlxB/Pg/xKymvC0u0mCMprQu7uNQVxfBvI2zh kFXF0exAsTnW5BafRTviX5b6qAhssofgZB5/ZUrQyC2yhDl+bN+mvtuHXEQifjVnvru2Rgn4nl6 Lo935DlFwq4gWY4e9SmtIsYkWmGPMnO+fYozj4GE4vxPWkfwP9/xL0NurDXe/Wcw27qx6NJufcg jggHOa3wC17t2DLksCUMfj5yCqwiEiMvrbzaV4QnNZlsM4vUV0mdp23a0Mqb+SDWzVW0C+ppPI0 Wb6zwVPqK0kD7j9vDh1VHxs= X-Received: by 2002:a5d:4568:0:b0:385:decf:52bc with SMTP id ffacd0b85a97d-38bf5671bc4mr14426396f8f.32.1737555932573; Wed, 22 Jan 2025 06:25:32 -0800 (PST) X-Google-Smtp-Source: AGHT+IGpy7cMaLRk9dVBA0cgo6W1sQQN68NTp1FDxLTVIYJMJIluWqgnE07uX+vLudYeqOl4NBxvoQ== X-Received: by 2002:a5d:4568:0:b0:385:decf:52bc with SMTP id ffacd0b85a97d-38bf5671bc4mr14426373f8f.32.1737555932110; Wed, 22 Jan 2025 06:25:32 -0800 (PST) Received: from [192.168.3.141] (p4ff2353b.dip0.t-ipconnect.de. [79.242.53.59]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-38bf3221c30sm16704789f8f.32.2025.01.22.06.25.30 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Wed, 22 Jan 2025 06:25:30 -0800 (PST) Message-ID: <6e55db63-debf-41e6-941e-04690024d591@redhat.com> Date: Wed, 22 Jan 2025 15:25:29 +0100 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH RFC v1 0/5] KVM: gmem: 2MB THP support and preparedness tracking changes To: "Shah, Amit" , "kvm@vger.kernel.org" , "Roth, Michael" Cc: "liam.merwick@oracle.com" , "seanjc@google.com" , "jroedel@suse.de" , "linux-mm@kvack.org" , "Sampat, Pratik Rajesh" , "linux-kernel@vger.kernel.org" , "Lendacky, Thomas" , "vbabka@suse.cz" , "pbonzini@redhat.com" , "linux-coco@lists.linux.dev" , "quic_eberman@quicinc.com" , "Kalra, Ashish" , "ackerleytng@google.com" , "vannapurve@google.com" References: <20241212063635.712877-1-michael.roth@amd.com> <11280705-bcb1-4a5e-a689-b8a5f8a0a9a6@redhat.com> <3bd7936624b11f755608b1c51cc1376ebf2c3a4f.camel@amd.com> From: David Hildenbrand Autocrypt: addr=david@redhat.com; keydata= xsFNBFXLn5EBEAC+zYvAFJxCBY9Tr1xZgcESmxVNI/0ffzE/ZQOiHJl6mGkmA1R7/uUpiCjJ dBrn+lhhOYjjNefFQou6478faXE6o2AhmebqT4KiQoUQFV4R7y1KMEKoSyy8hQaK1umALTdL QZLQMzNE74ap+GDK0wnacPQFpcG1AE9RMq3aeErY5tujekBS32jfC/7AnH7I0v1v1TbbK3Gp XNeiN4QroO+5qaSr0ID2sz5jtBLRb15RMre27E1ImpaIv2Jw8NJgW0k/D1RyKCwaTsgRdwuK Kx/Y91XuSBdz0uOyU/S8kM1+ag0wvsGlpBVxRR/xw/E8M7TEwuCZQArqqTCmkG6HGcXFT0V9 PXFNNgV5jXMQRwU0O/ztJIQqsE5LsUomE//bLwzj9IVsaQpKDqW6TAPjcdBDPLHvriq7kGjt WhVhdl0qEYB8lkBEU7V2Yb+SYhmhpDrti9Fq1EsmhiHSkxJcGREoMK/63r9WLZYI3+4W2rAc UucZa4OT27U5ZISjNg3Ev0rxU5UH2/pT4wJCfxwocmqaRr6UYmrtZmND89X0KigoFD/XSeVv jwBRNjPAubK9/k5NoRrYqztM9W6sJqrH8+UWZ1Idd/DdmogJh0gNC0+N42Za9yBRURfIdKSb B3JfpUqcWwE7vUaYrHG1nw54pLUoPG6sAA7Mehl3nd4pZUALHwARAQABzSREYXZpZCBIaWxk ZW5icmFuZCA8ZGF2aWRAcmVkaGF0LmNvbT7CwZgEEwEIAEICGwMGCwkIBwMCBhUIAgkKCwQW AgMBAh4BAheAAhkBFiEEG9nKrXNcTDpGDfzKTd4Q9wD/g1oFAl8Ox4kFCRKpKXgACgkQTd4Q 9wD/g1oHcA//a6Tj7SBNjFNM1iNhWUo1lxAja0lpSodSnB2g4FCZ4R61SBR4l/psBL73xktp rDHrx4aSpwkRP6Epu6mLvhlfjmkRG4OynJ5HG1gfv7RJJfnUdUM1z5kdS8JBrOhMJS2c/gPf wv1TGRq2XdMPnfY2o0CxRqpcLkx4vBODvJGl2mQyJF/gPepdDfcT8/PY9BJ7FL6Hrq1gnAo4 3Iv9qV0JiT2wmZciNyYQhmA1V6dyTRiQ4YAc31zOo2IM+xisPzeSHgw3ONY/XhYvfZ9r7W1l pNQdc2G+o4Di9NPFHQQhDw3YTRR1opJaTlRDzxYxzU6ZnUUBghxt9cwUWTpfCktkMZiPSDGd KgQBjnweV2jw9UOTxjb4LXqDjmSNkjDdQUOU69jGMUXgihvo4zhYcMX8F5gWdRtMR7DzW/YE BgVcyxNkMIXoY1aYj6npHYiNQesQlqjU6azjbH70/SXKM5tNRplgW8TNprMDuntdvV9wNkFs 9TyM02V5aWxFfI42+aivc4KEw69SE9KXwC7FSf5wXzuTot97N9Phj/Z3+jx443jo2NR34XgF 89cct7wJMjOF7bBefo0fPPZQuIma0Zym71cP61OP/i11ahNye6HGKfxGCOcs5wW9kRQEk8P9 M/k2wt3mt/fCQnuP/mWutNPt95w9wSsUyATLmtNrwccz63XOwU0EVcufkQEQAOfX3n0g0fZz Bgm/S2zF/kxQKCEKP8ID+Vz8sy2GpDvveBq4H2Y34XWsT1zLJdvqPI4af4ZSMxuerWjXbVWb T6d4odQIG0fKx4F8NccDqbgHeZRNajXeeJ3R7gAzvWvQNLz4piHrO/B4tf8svmRBL0ZB5P5A 2uhdwLU3NZuK22zpNn4is87BPWF8HhY0L5fafgDMOqnf4guJVJPYNPhUFzXUbPqOKOkL8ojk CXxkOFHAbjstSK5Ca3fKquY3rdX3DNo+EL7FvAiw1mUtS+5GeYE+RMnDCsVFm/C7kY8c2d0G NWkB9pJM5+mnIoFNxy7YBcldYATVeOHoY4LyaUWNnAvFYWp08dHWfZo9WCiJMuTfgtH9tc75 7QanMVdPt6fDK8UUXIBLQ2TWr/sQKE9xtFuEmoQGlE1l6bGaDnnMLcYu+Asp3kDT0w4zYGsx 5r6XQVRH4+5N6eHZiaeYtFOujp5n+pjBaQK7wUUjDilPQ5QMzIuCL4YjVoylWiBNknvQWBXS lQCWmavOT9sttGQXdPCC5ynI+1ymZC1ORZKANLnRAb0NH/UCzcsstw2TAkFnMEbo9Zu9w7Kv AxBQXWeXhJI9XQssfrf4Gusdqx8nPEpfOqCtbbwJMATbHyqLt7/oz/5deGuwxgb65pWIzufa N7eop7uh+6bezi+rugUI+w6DABEBAAHCwXwEGAEIACYCGwwWIQQb2cqtc1xMOkYN/MpN3hD3 AP+DWgUCXw7HsgUJEqkpoQAKCRBN3hD3AP+DWrrpD/4qS3dyVRxDcDHIlmguXjC1Q5tZTwNB boaBTPHSy/Nksu0eY7x6HfQJ3xajVH32Ms6t1trDQmPx2iP5+7iDsb7OKAb5eOS8h+BEBDeq 3ecsQDv0fFJOA9ag5O3LLNk+3x3q7e0uo06XMaY7UHS341ozXUUI7wC7iKfoUTv03iO9El5f XpNMx/YrIMduZ2+nd9Di7o5+KIwlb2mAB9sTNHdMrXesX8eBL6T9b+MZJk+mZuPxKNVfEQMQ a5SxUEADIPQTPNvBewdeI80yeOCrN+Zzwy/Mrx9EPeu59Y5vSJOx/z6OUImD/GhX7Xvkt3kq Er5KTrJz3++B6SH9pum9PuoE/k+nntJkNMmQpR4MCBaV/J9gIOPGodDKnjdng+mXliF3Ptu6 3oxc2RCyGzTlxyMwuc2U5Q7KtUNTdDe8T0uE+9b8BLMVQDDfJjqY0VVqSUwImzTDLX9S4g/8 kC4HRcclk8hpyhY2jKGluZO0awwTIMgVEzmTyBphDg/Gx7dZU1Xf8HFuE+UZ5UDHDTnwgv7E th6RC9+WrhDNspZ9fJjKWRbveQgUFCpe1sa77LAw+XFrKmBHXp9ZVIe90RMe2tRL06BGiRZr jPrnvUsUUsjRoRNJjKKA/REq+sAnhkNPPZ/NNMjaZ5b8Tovi8C0tmxiCHaQYqj7G2rgnT0kt WNyWQQ== Organization: Red Hat In-Reply-To: <3bd7936624b11f755608b1c51cc1376ebf2c3a4f.camel@amd.com> X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: QtO4l5kEUPsCaAmSzwga0MFpcH8MSSDAeJZ6yJHjR9c_1737555933 X-Mimecast-Originator: redhat.com Content-Language: en-US Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Rspam-User: X-Rspamd-Server: rspam01 X-Rspamd-Queue-Id: A6277180003 X-Stat-Signature: eqr9zsy8k8iugxz6sx7w9b4xa5uxhmh8 X-HE-Tag: 1737555950-505517 X-HE-Meta: U2FsdGVkX1/oYWqmsrWSiqd4MQXXq9wWuyufSgOGaTvkwQF4LqKyM8Vd/m/SwTba7KKjXKYTAGd/ocVRA6yusHXvT092CSo+50k0lSYmdtdWJN+UgoTAIBrlWl92aBWTku+8CIogoCYs1QQ/L1m7TCYJtbsb9KVCeiIe+hNpTlKWyR5lzFCuo26RlXvaRS3oBcT/zyYuJCpwmNkHfu9wQcok/pDoQXROmi/hoXe+lynd2XHnZFHXURaY3eiAKv7N9fF3b3NQIZaGUAgNn2oeV+XiHZ8+BgS7Y8PIXZ9t8URXpmf77MaUoKfybn+fBG9hlg8TbEzcCm294ePhsANGDAtHEHBmqlWSCxtcXtYw2spWkBjEEI/ctpFuPAE5MLQp+Y8cViY6h3QD1xWkxnnvVPTiitogCsqO3C+5qB7ruEUwKoRe1vyUGOkgStRJd3CJkfTsrRAkUSg3yb0Q500hoZS5U++guQHl8jcveVH92yRcPhyDCqK9lh0lYEmMiadbzSXf7xRmI7K6ZMvY9ywMpC5qytKKkymOV8+dQbEl5BBaimaZo1dmkt2SS3e22OAkmxbSGwN8a7jB+FaxAGx/KoHt5pAsgOgtLTOOiFXAI+H6P08h/zygmvX4uvSuNDqg3/zQaDaEiilM5Z3p9WDycGDhljOE5bMVDluhtHbOoUIryzL9KpYdEdAvr5d5xJ9Kw2Tdi3EeQ6Wz0R2FQRC6OTQ/rUKku1lBb7iBhEMNO61rjnATwI4x6thHNEnr8gLDNrRQKW8ZjoxnlMaE0J8D9EBinDmTdFkU3QQZ/0VLNbq7+hFL8wmEMls2SnzY+QVlOKDNcE+O8CaffdFJuNjXzrdXjmvqHarW2Slt2h7S8XsT7WHJtjBRNjvAlt7ygabnD/uDLdTWJJMYhiYqDwIgP1uRzdcdKh25D1zjyY1009/meA10ufTaTiu4HDGW/H0abPby9+I/pPC7ZCVZ3nM RQ/yatFa nYZY2cUeH6zHGxsP/TlP63QEJEBHu8dptK6BG/jSHYdGVNZwbtk1YV0U7KZjDJ/ZZlFOKT8cAfrG1aXU/xyqLwtn8/sqLisyYWe1KtUhxjdXqw9s5gKa/QT37VpyR1o8bkMetct8eiTK7dQZT7sEDEN2pkEoedyILV61ml78ujZhuGkEw5UPM23BsQWmJKT/DswbES30Mf3NDxcsigBYDmEOshlrwWIWIbFZS8W1FoZvU5sQuQLFT6pQCwyZZ4a8ZskPMbSRw8RmZl7mkKAD4J1vjndTyRD2bBhFYc/mlpY8Moj2xGmHhCDrduKZdjnhLdwGd2pBjxO7IjcaiKUGPf/oLMcoxSkvZI+WNo1SldcoWka0YCREotUAYiSnauHZUWTkj10wY/3cgsHE1jY407Eow8EVzrfChOtveJEHqzppw7vJhvk+IeECCfb0mHL3H/LpainrqmqUg3g8P2hJto/UTiilyUKNi+ZCh X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: >> Sorry for the late reply, it's been a couple of crazy weeks, and I'm >> trying to give at least some feedback on stuff in my inbox before >> even >> more will pile up over Christmas :) . Let me summarize my thoughts: > > My turn for the lateness - back from a break. > > I should also preface that Mike is off for at least a month more, but > he will return to continue working on this. In the meantime, I've had > a chat with him about this work to keep the discussion alive on the > lists. So now it's my turn to being late again ;) As promised during the last call, a few points from my side. > >> THPs in Linux rely on the following principle: >> >> (1) We try allocating a THP, if that fails we rely on khugepaged to >> fix >>      it up later (shmem+anon). So id we cannot grab a free THP, we >>      deffer it to a later point. >> >> (2) We try to be as transparent as possible: punching a hole will >>      usually destroy the THP (either immediately for shmem/pagecache >> or >>      deferred for anon memory) to free up the now-free pages. That's >>      different to hugetlb, where partial hole-punching will always >> zero- >>      out the memory only; the partial memory will not get freed up >> and >>      will get reused later. >> >>      Destroying a THP for shmem/pagecache only works if there are no >>      unexpected page references, so there can be cases where we fail >> to >>      free up memory. For the pagecache that's not really >>      an issue, because memory reclaim will fix that up at some point. >> For >>      shmem, there  were discussions to do scan for 0ed pages and free >>      them up during memory reclaim, just like we do now for anon >> memory >>       as well. >> >> (3) Memory compaction is vital for guaranteeing that we will be able >> to >>      create THPs the longer the system was running, >> >> >> With guest_memfd we cannot rely on any daemon to fix it up as in (1) >> for >> us later (would require page memory migration support). > > True. And not having a huge page when requested to begin with (as in 1 > above) beats the purpose entirely -- the point is to speed up SEV-SNP > setup and guests by having fewer pages to work with. Right. > >> We use truncate_inode_pages_range(), which will split a THP into >> small >> pages if you partially punch-hole it, so (2) would apply; splitting >> might fail as well in some cases if there are unexpected references. >> >> I wonder what would happen if user space would punch a hole in >> private >> memory, making truncate_inode_pages_range() overwrite it with 0s if >> splitting the THP failed (memory write to private pages under TDX?). >> Maybe something similar would happen if a private page would get 0-ed >> out when freeing+reallocating it, not sure how that is handled. >> >> >> guest_memfd currently actively works against (3) as soon as we (A) >> fallback to allocating small pages or (B) split a THP due to hole >> punching, as the remaining fragments cannot get reassembled anymore. >> >> I assume there is some truth to "hole-punching is a userspace >> policy", >> but this mechanism will actively work against itself as soon as you >> start falling back to small pages in any way. >> >> >> >> So I'm wondering if a better start would be to (A) always allocate >> huge >> pages from the buddy (no fallback) and > > that sounds fine.. > >> (B) partial punches are either >> disallowed or only zero-out the memory. But even a sequence of >> partial >> punches that cover the whole huge page will not end up freeing all >> parts >> if splitting failed at some point, which I quite dislike ... > > ... this basically just looks like hugetlb support (i.e. without the > "transparent" part), isn't it? Yes, just using a different allocator until we have a predictable allocator with reserves. Note that I am not sure how much "transparent" here really applies, given the differences to THPs ... > >> But then we'd need memory preallocation, and I suspect to make this >> really useful -- just like with 2M/1G "hugetlb" support -- in-place >> shared<->private conversion will be a requirement. ... at which point >> we'd have reached the state where it's almost the 2M hugetlb support. > > Right, exactly. > >> This is not a very strong push back, more a "this does not quite >> sound >> right to me" and I have the feeling that this might get in the way of >> in-place shared<->private conversion; I might be wrong about the >> latter >> though. As discussed in the last bi-weekly MM meeting (and in contrast to what I assumed), Vishal was right: we should be able to support in-place shared<->private conversion as long as we can split a large folio when any page of it is getting converted to shared. (split is possible if there are no unexpected folio references; private pages cannot be GUP'ed, so it is feasible) So similar to the hugetlb work, that split would happen and would be a bit "easier", because ordinary folios (in contrast to hugetlb) are prepared to be split. So supporting larger folios for private memory might not make in-place conversion significantly harder; the important part is that shared folios may only be small. The split would just mean that we start exposing individual small folios to the core-mm, not that we would allow page migration for the shared parts etc. So the "whole 2M chunk" will remain allocated to guest_memfd. > > TBH my 2c are that getting hugepage supported, and disabling THP for > SEV-SNP guests will work fine. Likely it will not be that easy as soon as hugetlb reserves etc. will come into play. > > But as Mike mentioned above, this series is to add a user on top of > Paolo's work - and that seems more straightforward to experiment with > and figure out hugepage support in general while getting all the other > hugepage details done in parallel. I would suggest to not call this "THP". Maybe we can call it "2M folio support" for gmem. Similar to other FSes, we could just not limit ourselves to 2M folios, and simply allocate any large folios. But sticking to 2M might be beneficial in regards to memory fragmentation (below). > >> With memory compaction working for guest_memfd, it would all be >> easier. > > ... btw do you know how well this is coming along? People have been talking about that, but I suspect this is very long-term material. > >> Note that I'm not quite sure about the "2MB" interface, should it be >> a >> "PMD-size" interface? > > I think Mike and I touched upon this aspect too - and I may be > misremembering - Mike suggested getting 1M, 2M, and bigger page sizes > in increments -- and then fitting in PMD sizes when we've had enough of > those. That is to say he didn't want to preclude it, or gate the PMD > work on enabling all sizes first. Starting with 2M is reasonable for now. The real question is how we want to deal with (a) Not being able to allocate a 2M folio reliably (b) Partial discarding Using only (unmovable) 2M folios would effectively not cause any real memory fragmentation in the system, because memory compaction operates on 2M pageblocks on x86. So that feels quite compelling. Ideally we'd have a 2M pagepool from which guest_memfd would allocate pages and to which it would putback pages. Yes, this sound similar to hugetlb, but might be much easier to implement, because we are not limited by some of the hugetlb design decisions (HVO, not being able to partially map them, etc.). -- Cheers, David / dhildenb