From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2F5C5D1AD43 for ; Wed, 16 Oct 2024 10:49:17 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id BA4CC6B0083; Wed, 16 Oct 2024 06:49:16 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id B55366B0088; Wed, 16 Oct 2024 06:49:16 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A1C296B0089; Wed, 16 Oct 2024 06:49:16 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 8315C6B0083 for ; Wed, 16 Oct 2024 06:49:16 -0400 (EDT) Received: from smtpin12.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 4EC51161BEC for ; Wed, 16 Oct 2024 10:49:05 +0000 (UTC) X-FDA: 82679143260.12.D3A72D0 Received: from mail-ed1-f51.google.com (mail-ed1-f51.google.com [209.85.208.51]) by imf30.hostedemail.com (Postfix) with ESMTP id 6B97A8000D for ; Wed, 16 Oct 2024 10:48:58 +0000 (UTC) Authentication-Results: imf30.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=3uW5TO7I; spf=pass (imf30.hostedemail.com: domain of vannapurve@google.com designates 209.85.208.51 as permitted sender) smtp.mailfrom=vannapurve@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1729075611; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=rqRqGRF1VySphqdyBOb5GNHPJs16AcbMlWU0XE/fB+o=; b=i5/tZeIzzj8cS8O23sZd8pYqXgD/7eqLs7n1cpEFaV/z1d6FAszMT11ah3YZmbqxJtijWF uvaXdXhYS3l1sf/E+gA69tp0wI9GXPrgWlz0Wzh707cEPe9/LD5lWb2GBZEfa15yHJGDw4 LAI072lCiUgAFOvDl7HyK8eRsgqINAs= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1729075611; a=rsa-sha256; cv=none; b=QRdNoqN4hRjtqPlTXMPVruyhuxDLQDeJNLr8VUfQQiI69I+GMlFWNg4YXM9UrYr3AKv0rP bpN7iBKptyvgrb3zWW1I0sGhG5i3yE8vlDcxNqVTRIG+L1Ldn2+6NskkTP9CoKVZbJJnjN 3RXN0oyPbnTcSDAXlilwzpfuFycpCQ0= ARC-Authentication-Results: i=1; imf30.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=3uW5TO7I; spf=pass (imf30.hostedemail.com: domain of vannapurve@google.com designates 209.85.208.51 as permitted sender) smtp.mailfrom=vannapurve@google.com; dmarc=pass (policy=reject) header.from=google.com Received: by mail-ed1-f51.google.com with SMTP id 4fb4d7f45d1cf-5c93e9e701fso10099a12.1 for ; Wed, 16 Oct 2024 03:49:14 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1729075753; x=1729680553; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=rqRqGRF1VySphqdyBOb5GNHPJs16AcbMlWU0XE/fB+o=; b=3uW5TO7IVFbLfhkVTH1rCkPHF00V1LtlLOGxLGfA9y0Nm5qbQmSJ+kw13HlmIRBHFN l91fKWzltzxJwTwYa98PBqyXg+rRxAZ5kkeILG3VEr9Aq1YeTeTU4aUV6RphZAxr2KPA Q+Ruit3ED6NmUpjy4MusGgTYezgWpvmB0hUYcP0rL92cJ1WIaKXllTScABvtEXU8F7Uy vsvfxg1npupJTPin+/mpdz5LQbPt1vNXW40C9OxcNbbjTCUA4cy0QPcnHM6W02IAS70Q Tw3bHEVDMb38RWbSFG5CL0zxlBnRK5XRwK2oQMuZX2LMl3LhEuuePrUqeKURq3eUSTEd /daQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1729075753; x=1729680553; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=rqRqGRF1VySphqdyBOb5GNHPJs16AcbMlWU0XE/fB+o=; b=IodFrn8sra2F7NLfMgvj8HB40I35T/ZcCBOVYzqf4rayveY9b22Rfr9J6v5WmQ0BiO 3Jl3ZbmKh+OiMjuUW3BVZlHT3oLJP+h7WsqOADK/Z3g/IWj0JAPcIWnYetlnNxW1lx8C H/gKAnM8/WdTLqUYi8cCRMAv/8QGA+zKecNzpaNoQJluwwmJg9e96JGjmaZ0AdVrDOHx to5YJSd8bNDMqRag+XGjbJVdtzHKGNvbWYwyUMgvMRQiG9MG64TeDzU/a1WJRfOJn5mK NnwlLnvmxa5A8nBisXNacG6v2Xhc2AhLX0HwPFk2ziVGyfK2n9+3BjjPCa83H73C+cSu cTWQ== X-Forwarded-Encrypted: i=1; AJvYcCVQrTvGrLOVGWyn7lQS5/TBFBtV2fYIBD/ZWiIpVGtpHt9CPjUCqFOh/FTexsKwt3GWWg20ypFIsw==@kvack.org X-Gm-Message-State: AOJu0YyhVSIe3AVYQTdgDwT4nvQtLE3RnBBcOGTSHH7yjHOwWxu/gqLy GAyeIFhrXFgKkeqlYhexg33DYBU19aa+ZWZYly88jmdQbsTfSQ/pMb5icgsy++5kBRWskLjg3a6 QhNIuCRKHLXRbaBSEWNqPAl8oHv2D+aKSNskk X-Google-Smtp-Source: AGHT+IEOaQ4cgkPExzBkgZGD4dOtgPfim6YoQJY1ySC/ykGcUlK7ML3iO3mQ0h89zzZPsHTX3ghBdYyNrM7ENxR9XDk= X-Received: by 2002:a05:6402:34c6:b0:5c8:84b5:7e78 with SMTP id 4fb4d7f45d1cf-5c997a90d4bmr449217a12.4.1729075742925; Wed, 16 Oct 2024 03:49:02 -0700 (PDT) MIME-Version: 1.0 References: <9abab5ad-98c0-48bb-b6be-59f2b3d3924a@redhat.com> In-Reply-To: <9abab5ad-98c0-48bb-b6be-59f2b3d3924a@redhat.com> From: Vishal Annapurve Date: Wed, 16 Oct 2024 16:18:49 +0530 Message-ID: Subject: Re: [RFC PATCH 26/39] KVM: guest_memfd: Track faultability within a struct kvm_gmem_private To: David Hildenbrand Cc: Ackerley Tng , Peter Xu , tabba@google.com, quic_eberman@quicinc.com, roypat@amazon.co.uk, jgg@nvidia.com, rientjes@google.com, fvdl@google.com, jthoughton@google.com, seanjc@google.com, pbonzini@redhat.com, zhiquan1.li@intel.com, fan.du@intel.com, jun.miao@intel.com, isaku.yamahata@intel.com, muchun.song@linux.dev, erdemaktas@google.com, qperret@google.com, jhubbard@nvidia.com, willy@infradead.org, shuah@kernel.org, brauner@kernel.org, bfoster@redhat.com, kent.overstreet@linux.dev, pvorel@suse.cz, rppt@kernel.org, richard.weiyang@gmail.com, anup@brainfault.org, haibo1.xu@intel.com, ajones@ventanamicro.com, vkuznets@redhat.com, maciej.wieczor-retman@intel.com, pgonda@google.com, oliver.upton@linux.dev, linux-kernel@vger.kernel.org, linux-mm@kvack.org, kvm@vger.kernel.org, linux-kselftest@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: 6B97A8000D X-Stat-Signature: 86run3u193n4bxjnjgmkamr7jo4fzts8 X-Rspamd-Server: rspam09 X-Rspam-User: X-HE-Tag: 1729075738-7356 X-HE-Meta: U2FsdGVkX1843XUnefSfhKNNKAwAFxsiA9WurcMEuLM9aEjcZu4mvsitradFaxivPY1dTgRMk28GLbHb5gj8xiENCx6f6AGZlZyQ3a31E8Lx2O32AJ+8Ykd0FNhLhf43L4qQvvVtM9oCzF9fYP4wuEhCK0u3a8iuuMXPy+N55sQ4o5hM/w7gRxMOSJq/LurhqpWl05kj45YhUkECxPLbGtM0OFbRzq0h3jZI1fhxlgunxq6p+GKxGODcWKJnTp9x4VDOg0jaPuS3aR/3WMlt5TnU4pIJ4iFNZlHfRgLh5BcFfmn1JlIZfBOcHwfOv3dIM93HYZkiKtcNSWRYd7aLrc8VaXWstHVQJCF8xD9syyiKNr4KNQH01PzZmiRimAoRziZVIPBiBkBQ3moWims64HYTNljSCNUvZnNOfimBmXr4hzOTHwmmH7/23MqRNyeVX7RU5EhKjw1vA3V+gTrN1X0oirCAGnD591otJ7fPHngkDBVGxdu/wBZoKjWrFPD2LXyo/6h0+CWCkjzahcaJ6EPLDdKbv6+eSRXIRzuTMo/FW8hGZkD2Cw0TM+bfXb6E76vWBf4Ms5ooeuGErukgcFMzvBqLN7extb57+x++FtRgyCgNWATKj+x0J/AJHuMeJxnf89nJpVe1Rx6cnhhaOVF9BTA0vFauMYyViZGHA1XJA9N1k0LJY9DhfrlSPo3MMufFU5WKPJzk5o8uLV+U6NkxhJuX6SGvV/JofcsF2oK5ngyb5TPtE5sR87WLdLkPrQ0AvP0Dq5wcecEtyn53A/kHBAgStux53PTVx57hLCcFZwqZ91DBwhrlKwab2kaMHxTV3T+ULRIBi0HTX+QZy1TOgqj0x07uePf+ihalRmmjG+HRpLBOakKVl9djIR/jOUXEvHJq+qaaOXphjMZOz9gmi4ZxJ9kzxh6RC/5DOPLog6euBDwbBrnxQcfi4z/QXi/R5dGCNieUzAhlxlo B4KLN2ZH RujqXWfZntq3CkA5mfjWRoe/tLbBp2eGBMFx4CP1nmk8+eQyiUT+vhzhD85l9fnos4jnvy8PoiybSgu7ySi11YvMO3VNvYKq/SZ/hZfDJNypjIELuquJ+FV6XhOU0MDtzPw29tcjvyF83G6DPiVjc4zpy9u3wcDoT5RP0hJJXQDXgtXLqYUbfGPwQlht81Zeu+OYJ1V0HA0vn/LPegQoCUjS3bFhpBWHVJeSPIvmrFbf0lTTGQ5YoMKjvCdBLMLnAmC8AozjEvT2967L/XoXHVgi/yLxREu8NlufyWzP5PvEZgOE1UrJ2v7cz4pvpbr7C/ZUv3giY3e05c8y7ge9dtLKaVQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000004, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Wed, Oct 16, 2024 at 2:20=E2=80=AFPM David Hildenbrand wrote: > > >> I also don't know how you treat things like folio_test_hugetlb() on > >> possible assumptions that the VMA must be a hugetlb vma. I'd confess = I > >> didn't yet check the rest of the patchset yet - reading a large series > >> without a git tree is sometimes challenging to me. > >> > > > > I'm thinking to basically never involve folio_test_hugetlb(), and the > > VMAs used by guest_memfd will also never be a HugeTLB VMA. That's > > because only the HugeTLB allocator is used, but by the time the folio i= s > > mapped to userspace, it would have already have been split. After the > > page is split, the folio loses its HugeTLB status. guest_memfd folios > > will never be mapped to userspace while they still have a HugeTLB > > status. > > We absolutely must convert these hugetlb folios to non-hugetlb folios. > > That is one of the reasons why I raised at LPC that we should focus on > leaving hugetlb out of the picture and rather have a global pool, and > the option to move folios from the global pool back and forth to hugetlb > or to guest_memfd. > > How exactly that would look like is TBD. > > For the time being, I think we could add a "hack" to take hugetlb folios > from hugetlb for our purposes, but we would absolutely have to convert > them to non-hugetlb folios, especially when we split them to small > folios and start using the mapcount. But it doesn't feel quite clean. As hugepage folios need to be split up in order to support backing CoCo VMs with hugepages, I would assume any folio based hugepage memory allocation will need to go through split/merge cycles through the guest memfd lifetime. Plan through next RFC series is to abstract out the hugetlb folio management within guest_memfd so that any hugetlb specific logic is cleanly separated out and allows guest memfd to allocate memory from other hugepage allocators in the future. > > Simply starting with a separate global pool (e.g., boot-time allocation > similar to as done by hugetlb, or CMA) might be cleaner, and a lot of > stuff could be factored out from hugetlb code to achieve that. I am not sure if a separate global pool necessarily solves all the issues here unless we come up with more concrete implementation details. One of the concerns was the ability of implementing/retaining HVO while transferring memory between the separate global pool and hugetlb pool i.e. whether it can seamlessly serve all hugepage users on the host. Another question could be whether the separate pool/allocator simplifies the split/merge operations at runtime. > > -- > Cheers, > > David / dhildenb >