From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 68D48D0EE3F for ; Fri, 11 Oct 2024 23:32:17 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id E9A7F6B00A1; Fri, 11 Oct 2024 19:32:16 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id E4B336B00A4; Fri, 11 Oct 2024 19:32:16 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id CC4C36B00A9; Fri, 11 Oct 2024 19:32:16 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id AA2EF6B00A1 for ; Fri, 11 Oct 2024 19:32:16 -0400 (EDT) Received: from smtpin17.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 2AAFE16068B for ; Fri, 11 Oct 2024 23:32:11 +0000 (UTC) X-FDA: 82662922146.17.756DEAB Received: from mail-pl1-f202.google.com (mail-pl1-f202.google.com [209.85.214.202]) by imf10.hostedemail.com (Postfix) with ESMTP id 2CF2EC000A for ; Fri, 11 Oct 2024 23:32:12 +0000 (UTC) Authentication-Results: imf10.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=MXvlk9c7; spf=pass (imf10.hostedemail.com: domain of 3fLUJZwsKCE0przt60tD82vv33v0t.r310x29C-11zAprz.36v@flex--ackerleytng.bounces.google.com designates 209.85.214.202 as permitted sender) smtp.mailfrom=3fLUJZwsKCE0przt60tD82vv33v0t.r310x29C-11zAprz.36v@flex--ackerleytng.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1728689350; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=nQXxRrnrV7O63nBn6aNSK0Ta44wyj59/3jigJZtHDdg=; b=Y7lISVE2qQxa4mltqzA0j9s126q/UwWi10Spi/3M5vHDYbRp0ix5b7VMn83t1QMT+GTliI wWLZ4oragFyUsxJGaZKKIW1hWYhiKetl5zerZwNDI+BrtI39gnTAsTNAP3O5V5ca9aZmXO JfKfF/6tkCJeGgUxTIHOjWBFyn0k9/U= ARC-Authentication-Results: i=1; imf10.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=MXvlk9c7; spf=pass (imf10.hostedemail.com: domain of 3fLUJZwsKCE0przt60tD82vv33v0t.r310x29C-11zAprz.36v@flex--ackerleytng.bounces.google.com designates 209.85.214.202 as permitted sender) smtp.mailfrom=3fLUJZwsKCE0przt60tD82vv33v0t.r310x29C-11zAprz.36v@flex--ackerleytng.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1728689350; a=rsa-sha256; cv=none; b=vaKsXS2UqCTRgOmbIEFqmHQxQOMCqXs7ZfD8zRaSawoZmcg/GqdxDBzdl1VRc9r9R4020E F9EvzI/5MBYINPcKHrI/ec+nOAZq6t/v6v+51CKKDFsGcf265d7XziVJMIo2eyjxUNBihN 9d5sl2J07yTiieN6/t0sYy0d0AfEHAM= Received: by mail-pl1-f202.google.com with SMTP id d9443c01a7336-20bb491189aso31942065ad.3 for ; Fri, 11 Oct 2024 16:32:14 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1728689533; x=1729294333; darn=kvack.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=nQXxRrnrV7O63nBn6aNSK0Ta44wyj59/3jigJZtHDdg=; b=MXvlk9c70JslKRbGTE0QLTfM/5uz0jmrU73NWEDCEwe2zM00U58dZCmP2tAzYWiKt6 l2et4tinufrjmcm1l18H+OOVfFFBA6VI85x+DXxoLMEZb+W53R0y6QwLOtwgMu2/Esma l67luggyf2xoQJz5mMa6Dv8i85TRX+ndwOOK7j1nzFSfPOthmf9cdSwlo0uepO8GVqIa f0r1NrbJ969B18DHflGBDyc5nJVUEhqLBxcqsXSycNkPPYDNBI20+5CybaD7OJcpgaiI cKWYM1nssfjJzr+H+/4cXV8puGlhAhzX+/OG0X16D0Jmkd+i09oPpDoxQ5erI9mTri2o nMhQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1728689533; x=1729294333; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=nQXxRrnrV7O63nBn6aNSK0Ta44wyj59/3jigJZtHDdg=; b=V6Jg1LZP5COJwOsG2BfFeB5qqgbZk9pmTrML/lmf9MID3msPtVCtQ+T0p/jgYFNJau DNXV29L/G0RLn60R+YEbSwFPRs2V1wBgY98BLq7u7QSDC50ZyyaIcUfGGDE5FUsqFILP s8KW1YD20FziI3DUi7HRebtKTNLLH/1AoRI0m+8+ya8SwpFdcNT+obbNgxqNyLbh5tHI fUYcBO92+TZVNiwb29WTX+bA2uLQuACU6qvQVeRu4J0mMKEt3JXRQ4er/+dbC09SsQo8 lFZYMF54ufPRhtvP3RwTBHi/5GKkBJ3teBkcQAJ7MSJsUL3ZyiONlErs+8soZptKu6m9 a1/g== X-Forwarded-Encrypted: i=1; AJvYcCWbXtEAeh2f8kuMm1MHd74elsYRBIgyxM0o87DtPCRZTbB7ibkEryV2NxuorY6MHOAuiL8/rzNpeA==@kvack.org X-Gm-Message-State: AOJu0YxbdcMbuA3/GIQjpMXoVxyelCANwOoLQAhbuwsbz8HtjC3rXWBj jrE9r7IKl6BfWFcnX6pSfoDJGSXl3xZlrDFC76gZtVTXRZ/9f0PpcDTU9gPT9qr2ljzGCt0+rda f/v6e/8XG2cjbpar/6yR7Cg== X-Google-Smtp-Source: AGHT+IEr0SQ3XRg7hoB/1zJoRlSN31+WIwCwFTPdS0u4dzfqsgHYjCK/hMmgYKFzuCsnHMIZGemMQB8gxXvAks9d2g== X-Received: from ackerleytng-ctop.c.googlers.com ([fda3:e722:ac3:cc00:146:b875:ac13:a9fc]) (user=ackerleytng job=sendgmr) by 2002:a17:902:c942:b0:20b:4d4d:fd1e with SMTP id d9443c01a7336-20ca13dacc3mr442805ad.2.1728689532873; Fri, 11 Oct 2024 16:32:12 -0700 (PDT) Date: Fri, 11 Oct 2024 23:32:11 +0000 In-Reply-To: Mime-Version: 1.0 References: Message-ID: Subject: Re: [RFC PATCH 26/39] KVM: guest_memfd: Track faultability within a struct kvm_gmem_private From: Ackerley Tng To: Peter Xu Cc: tabba@google.com, quic_eberman@quicinc.com, roypat@amazon.co.uk, jgg@nvidia.com, david@redhat.com, rientjes@google.com, fvdl@google.com, jthoughton@google.com, seanjc@google.com, pbonzini@redhat.com, zhiquan1.li@intel.com, fan.du@intel.com, jun.miao@intel.com, isaku.yamahata@intel.com, muchun.song@linux.dev, erdemaktas@google.com, vannapurve@google.com, qperret@google.com, jhubbard@nvidia.com, willy@infradead.org, shuah@kernel.org, brauner@kernel.org, bfoster@redhat.com, kent.overstreet@linux.dev, pvorel@suse.cz, rppt@kernel.org, richard.weiyang@gmail.com, anup@brainfault.org, haibo1.xu@intel.com, ajones@ventanamicro.com, vkuznets@redhat.com, maciej.wieczor-retman@intel.com, pgonda@google.com, oliver.upton@linux.dev, linux-kernel@vger.kernel.org, linux-mm@kvack.org, kvm@vger.kernel.org, linux-kselftest@vger.kernel.org Content-Type: text/plain; charset="UTF-8" X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: 2CF2EC000A X-Stat-Signature: mqd5ams8b6uagkwp3eq51j9jgy4saf1n X-Rspam-User: X-HE-Tag: 1728689532-232190 X-HE-Meta: U2FsdGVkX19TVGmdyWUFOUCsQkY88gmmREdabogQRowf3B7rn6svQV5MP3GcX35jAVgjHarWEsoGYM5eji63YW4PzViXGmSpVNjObWPDTkGPJUsQaHknWI6bTD8eSwnrx7WgHpgv2r+AezyVo9+J/DVwAvGlzVI3DviyzGVFo3A7xbvmI2Skxx+WRG305LpmrtQPiF+vldryPgyeqxBoX9GDirKbWShfYeSPXOdvp+5jFEgC/w4hwFKgajPkXP98TGU92cfyDC2xWHudiCI9mdiTIfQOhCjZHPLPt3m5v6ezn/kEhJ20W9vdDGtvM3VDcGMWIIg3C3aHmWHN4PIwK2Jdir30EGghsw8Z+dVagzd+HesfX52dtHqXqbHwvXgJg01Qq+Og9QVQq0hUIbLfVUwYn9uXao3ELxIbwA3tVN3U7H+5mPOh6hgWoK2xZgHK19QSahIJ2nLnmdbIXFu28A+D7xvRhIK4Ai5YgIWRzc79a+XOHrIVYYT0UFk3sfxV4ngcLeiqCVRT+hNY49ArAE+sHBpoNbvFplO/YBF/L6Rv/YMdcYzCe+OBpqsWbXkoPHsMdiwgEw/84vYqcpFRgdvq8Ql9CWpbJO5kRyzdmhO9T2T2j37Wx3LUZqBYJgYpCBMhwovw48HAQxiqr3aZvYtZwyZuTJiuZlokvLqj9O92SuQQ4bq5i3+vAw3QmxZFfu7VcG3OvnmP1QRXf3GdMESrQ6Y7kX7p5kJdaQD2mB34ux1m0zBlrYgWyLkfyYfyMlp+Y9QDvP3kCeo1FfL2ocVAtEcm7lF/QosLWIo2alciaT35wBronDPIym/dQnQ9JSi/kqUkcOcy49Bb2SsVQT3aMc4QmlOq5WXdkafIccsF9D9AlYm71tYib/T/o0+9a7T13D84BdlphWaJl5u0NmeigTZPmEkdRLI5m09ILeirZj5E2/EGyO/O5QH6bRzA/699J6Ode6Ce6VVYUkC 1AaBdNBW mp4ThOGvGRzQeOsPE0lACvBsW1H3pfAROF4YmR2B2GXwYoDiK2ixlkBL3UbuIoNp9Ww9pZv16SMHihWdseDrjCencKYNkFI0KP27h3wjWmQkdKjytkwoWRDXoahQzd81FyXpKZZgIj/1BTG6qmPuZILzKz9w1SXtICHzgka/gVyt0HpVotrsmIcrAr1isN0G1KDXHEht5YZ2R5JXJFpSXIpdEkjANtNQVEfobkIEQkNwW3aUNVPCQLF2V5RALSigFbcdKP2rGw6f2Gdjofwa/hxM7e9SrVzv4kb/daZa15/mfiMDbkwQNQd5EheOcWiciKnSxWGOtzP7BkRm/YyK1G7TIKTXIgRckYLXRaZfo8DjbpyMSIvFKmXW3Ud3h/vTyLMhzf97Z4a4w6YyVwvIIOsTPNSBRjc+gGsCigK4TR9BxyfbUhPbm2DqP22b8XkZE5lksMYWODMGK3TJqmfKCLZWwfF2ro0Px05sOx8U2gojyoesMkoDt319VmmlfikdQB7Imd7p2elG20YZvycjyQCZGgIl7pZFGlXrMdc1uOV2usOE4JtwdTW0WiLe5POW9ssiIGvbOn/XNDvN7byEhTA16baPo3P/x6OJ4OujKUsyl9xiiPHnyxiXh1KBosY0rxreEAEWDki2tbwFCqtN5FtfwWJj4vdW38b+E09PKZ/q+f59u3hIWelxcOKhf3lVWAn5F5fVbnVIFFfS20zBNUYyHuHb0Ve1wB6JXSXm2JbY61csbA8SCyWdRs8G0hM2VbzIv X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Peter Xu writes: > On Tue, Sep 10, 2024 at 11:43:57PM +0000, Ackerley Tng wrote: >> The faultability xarray is stored on the inode since faultability is a >> property of the guest_memfd's memory contents. >> >> In this RFC, presence of an entry in the xarray indicates faultable, >> but this could be flipped so that presence indicates unfaultable. For >> flexibility, a special value "FAULT" is used instead of a simple >> boolean. >> >> However, at some stages of a VM's lifecycle there could be more >> private pages, and at other stages there could be more shared pages. >> >> This is likely to be replaced by a better data structure in a future >> revision to better support ranges. >> >> Also store struct kvm_gmem_hugetlb in struct kvm_gmem_hugetlb as a >> pointer. inode->i_mapping->i_private_data. > > Could you help explain the difference between faultability v.s. the > existing KVM_MEMORY_ATTRIBUTE_PRIVATE? Not sure if I'm the only one who's > confused, otherwise might be good to enrich the commit message. Thank you for this question, I'll add this to the commit message to the next revision if Fuad's patch set [1] doesn't make it first. Reason (a): To elaborate on the explanation in [1], KVM_MEMORY_ATTRIBUTE_PRIVATE is whether userspace wants this page to be private or shared, and faultability is whether the page is allowed to be faulted in by userspace. These two are similar but may not be the same thing. In pKVM, pKVM cannot trust userspace's configuration of private/shared, and other information will go into determining the private/shared setting in faultability. Perhaps Fuad can elaborate more here. Reason (b): In this patch series (mostly focus on x86 first), we're using faultability to prevent any future faults before checking that there are no mappings. Having a different xarray from mem_attr_array allows us to disable faulting before committing to changing mem_attr_array. Please see `kvm_gmem_should_set_attributes_private()` in this patch [2]. We're not completely sure about the effectiveness of using faultability to block off future faults here, in future revisions we may be using a different approach. The folio_lock() is probably important if we need to check mapcount. Please let me know if you have any ideas! The starting point of having a different xarray was pKVM's requirement of having separate xarrays, and we later realized that the xarray could be used for reason (b). For x86 we could perhaps eventually remove the second xarray? Not sure as of now. > > The latter is per-slot, so one level higher, however I don't think it's a > common use case for mapping the same gmemfd in multiple slots anyway for > KVM (besides corner cases like live upgrade). So perhaps this is not about > layering but something else? For example, any use case where PRIVATE and > FAULTABLE can be reported with different values. > > Another higher level question is, is there any plan to support non-CoCo > context for 1G? I believe guest_memfd users are generally in favor of eventually using guest_memfd for non-CoCo use cases, which means we do want 1G (shared, in the case of CoCo) page support. However, core-mm's fault path does not support mapping at anything higher than the PMD level (other than hugetlb_fault(), which the community wants to move away from), so core-mm wouldn't be able to map 1G pages taken from HugeTLB. In this patch series, we always split pages before mapping them to userspace and that's how this series still works with core-mm. Having 1G page support for shared memory or for non-CoCo use cases would probably depend on better HugeTLB integration with core-mm, which you'd be most familiar with. Thank you for looking through our patches, we need your experience and help! I've also just sent out the first 3 patches separately, which I think is useful in improving understandability of the resv_map/subpool/hstate reservation system in HugeTLB and can be considered separately. Hope you can also review/comment on [4]. > I saw that you also mentioned you have working QEMU prototypes ready in > another email. It'll be great if you can push your kernel/QEMU's latest > tree (including all dependency patches) somewhere so anyone can have a > closer look, or play with it. Vishal's reply [3] might have been a bit confusing. To clarify, my team doesn't work with Qemu at all (we use a custom userspace VMM internally) so the patches in this series are tested purely with selftests. The selftests have fewer dependencies than full Qemu and I'd be happy to help with running them or explain anything that I might have missed out. We don't have any Qemu prototypes and are not likely to be building any prototypes in the foreseeable future. > > Thanks, > > -- > Peter Xu [1] https://lore.kernel.org/all/20241010085930.1546800-3-tabba@google.com/ [2] https://lore.kernel.org/all/f4ca1711a477a3b56406c05d125dce3d7403b936.1726009989.git.ackerleytng@google.com/ [3] https://lore.kernel.org/all/CAGtprH-GczOb64XrLpdW4ObRG7Gsv8tHWNhiW7=2dE=OAF7-Rw@mail.gmail.com/ [4] https://lore.kernel.org/all/cover.1728684491.git.ackerleytng@google.com/T/