From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8E588C021B1 for ; Wed, 19 Feb 2025 23:33:33 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 074314401A7; Wed, 19 Feb 2025 18:33:33 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id F3F4A280276; Wed, 19 Feb 2025 18:33:32 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id DB8F54401A7; Wed, 19 Feb 2025 18:33:32 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id B9F94280276 for ; Wed, 19 Feb 2025 18:33:32 -0500 (EST) Received: from smtpin02.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 6C5AC4BB56 for ; Wed, 19 Feb 2025 23:33:32 +0000 (UTC) X-FDA: 83138298264.02.B17ACD0 Received: from mail-pj1-f74.google.com (mail-pj1-f74.google.com [209.85.216.74]) by imf26.hostedemail.com (Postfix) with ESMTP id AC00E140002 for ; Wed, 19 Feb 2025 23:33:30 +0000 (UTC) Authentication-Results: imf26.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=JLLarSgJ; spf=pass (imf26.hostedemail.com: domain of 3SWq2ZwsKCDwYaicpjcwrleemmejc.amkjglsv-kkitYai.mpe@flex--ackerleytng.bounces.google.com designates 209.85.216.74 as permitted sender) smtp.mailfrom=3SWq2ZwsKCDwYaicpjcwrleemmejc.amkjglsv-kkitYai.mpe@flex--ackerleytng.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1740008010; a=rsa-sha256; cv=none; b=lPpOYjjoKFCyGaykxrQu4f6oFvwHHsDM/3YXU7YB01rJduJpP531ukEiqHLaa7ogl6cpdU 2nghxv4JNXGAJvZUhlfykYEXgTzX5lyJKCMFxLp9kmnVMSNIISBTOGNXy19Tn1I8x1+R5l d0PfoV7gc74YWSlmTUBLaqhXAW4cW7k= ARC-Authentication-Results: i=1; imf26.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=JLLarSgJ; spf=pass (imf26.hostedemail.com: domain of 3SWq2ZwsKCDwYaicpjcwrleemmejc.amkjglsv-kkitYai.mpe@flex--ackerleytng.bounces.google.com designates 209.85.216.74 as permitted sender) smtp.mailfrom=3SWq2ZwsKCDwYaicpjcwrleemmejc.amkjglsv-kkitYai.mpe@flex--ackerleytng.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1740008010; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:dkim-signature; bh=2DJ4izjEgjCUkVhwsanRv74q/a9BPlRBLaxGR4w5Rtg=; b=yBUCn+QASrUtXHeSQV8+WNjroYoMkYDXwdPf/6K8jvksslWa6lVv35pCrLHK7wUcTNrenH 3RmPOx14SrKKrcRfESyFO455YrP84yQEWuGpYg59WNRJ/HBmPxR+a0hW+7lGOvJs4cT54R InhsfPSfF1pPY/PtQJkia84eedNHLSQ= Received: by mail-pj1-f74.google.com with SMTP id 98e67ed59e1d1-2fc318bd470so671564a91.0 for ; Wed, 19 Feb 2025 15:33:30 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1740008009; x=1740612809; darn=kvack.org; h=cc:to:from:subject:message-id:mime-version:in-reply-to:date:from:to :cc:subject:date:message-id:reply-to; bh=2DJ4izjEgjCUkVhwsanRv74q/a9BPlRBLaxGR4w5Rtg=; b=JLLarSgJuIziiUmxAmZWWLDWwPBWlHmiU/fQWiSR7+KUtp6KrtLHPSH1gex2sFapBO aO2CWLuM+Ga+ZAaMxm5030wTNQPabKnNOeHHlaZOWF47TMKFuKJjq1Np83P48OUifJv3 9bn5RGg1VKzd9F4C/IKHDq3wM2a8LZfe5h3OuXMzS/695IEYAhQoF/xf02luazyDhAOb xOiLbLS3rlarZU1cvPRdihtrdE4IQxGgOGZbXHBuFC17pOOHt4zoV9KtPUNcsBqK0s4e Xfswy4s8oEbEnLm2Dor28TR9Sp/T9mLVpe9nF71EmGXNVjo6G3yxd9/4xUOV+9vgMuUf jUvQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1740008009; x=1740612809; h=cc:to:from:subject:message-id:mime-version:in-reply-to:date :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=2DJ4izjEgjCUkVhwsanRv74q/a9BPlRBLaxGR4w5Rtg=; b=wXU52y/bbSqdtDqylMpvVw45kgcAKSZIft5+09a1vnbC/31msryF8pYrMTdeVYVpXP sHMmh1KKCnjEukSi1thGGl/RWjQ8swPE3DhIhkjrXwIuvk8MjUbCLPEwyD2ri5af868j LwBoeVdX3JrlbHoOs4C2IFIlh9UA5miAm+gPJEnbwvN0+v0CWiF7tHecraeGSlBYfvTj +c7NtOofNrqk6HbGpQEm3AK4BrRj0P9CJfQxXaY98p4FvPfePRGW4Z/EFm+yjwmYN6C7 GB49jgHFB0cDm8oc6CzJJ2ZOlYlwfX3bvpdkjHvIH37gzPCiOA7JgHKpV3swL13nfSTC 35Lg== X-Forwarded-Encrypted: i=1; AJvYcCW/SAOBmskaety6W7mpAEunXIcleHk2qEVnWjeCOLfJ/DQNs1IhDv3epRnWoCxYtPC8jQPlanlX8w==@kvack.org X-Gm-Message-State: AOJu0YzDGaknI3CY/wfVcqGiBszky78xIldKN1X0Ao84QRodYBYspGV+ v4WutnxVWgCPGUigaaYaRWdhEu5fBcwQesgdmftcxhfirbIcELzu8NvYkI/gB5DCQ8kI/UgqbPr lrqSqqOeDKSwr6E1arbc59w== X-Google-Smtp-Source: AGHT+IEA9plpwCt1DpgufppGTOelrxd9+ehj77DBbE2MXuFhuysJKOX+DxJ0fO6BC6dr9p7RYFtSnpVQNc/t6/sErQ== X-Received: from pjbee11.prod.google.com ([2002:a17:90a:fc4b:b0:2fb:fac8:f45b]) (user=ackerleytng job=prod-delivery.src-stubby-dispatcher) by 2002:a17:90b:5201:b0:2f6:dcc9:38e0 with SMTP id 98e67ed59e1d1-2fcd0b645bemr605803a91.0.1740008009352; Wed, 19 Feb 2025 15:33:29 -0800 (PST) Date: Wed, 19 Feb 2025 23:33:28 +0000 In-Reply-To: <20250117163001.2326672-6-tabba@google.com> (message from Fuad Tabba on Fri, 17 Jan 2025 16:29:51 +0000) Mime-Version: 1.0 Message-ID: Subject: Re: [RFC PATCH v5 05/15] KVM: guest_memfd: Folio mappability states and functions that manage their transition From: Ackerley Tng To: Fuad Tabba Cc: kvm@vger.kernel.org, linux-arm-msm@vger.kernel.org, linux-mm@kvack.org, pbonzini@redhat.com, chenhuacai@kernel.org, mpe@ellerman.id.au, anup@brainfault.org, paul.walmsley@sifive.com, palmer@dabbelt.com, aou@eecs.berkeley.edu, seanjc@google.com, viro@zeniv.linux.org.uk, brauner@kernel.org, willy@infradead.org, akpm@linux-foundation.org, xiaoyao.li@intel.com, yilun.xu@intel.com, chao.p.peng@linux.intel.com, jarkko@kernel.org, amoorthy@google.com, dmatlack@google.com, yu.c.zhang@linux.intel.com, isaku.yamahata@intel.com, mic@digikod.net, vbabka@suse.cz, vannapurve@google.com, mail@maciej.szmigiero.name, david@redhat.com, michael.roth@amd.com, wei.w.wang@intel.com, liam.merwick@oracle.com, isaku.yamahata@gmail.com, kirill.shutemov@linux.intel.com, suzuki.poulose@arm.com, steven.price@arm.com, quic_eberman@quicinc.com, quic_mnalajal@quicinc.com, quic_tsoni@quicinc.com, quic_svaddagi@quicinc.com, quic_cvanscha@quicinc.com, quic_pderrin@quicinc.com, quic_pheragu@quicinc.com, catalin.marinas@arm.com, james.morse@arm.com, yuzenghui@huawei.com, oliver.upton@linux.dev, maz@kernel.org, will@kernel.org, qperret@google.com, keirf@google.com, roypat@amazon.co.uk, shuah@kernel.org, hch@infradead.org, jgg@nvidia.com, rientjes@google.com, jhubbard@nvidia.com, fvdl@google.com, hughd@google.com, jthoughton@google.com, tabba@google.com Content-Type: text/plain; charset="UTF-8" X-Stat-Signature: hs9hmbq539i14gt95dj87cjfdpdehwxc X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: AC00E140002 X-Rspam-User: X-HE-Tag: 1740008010-327423 X-HE-Meta: U2FsdGVkX19NyooGM19GBKdh59PRuqcP85yDOXk4XEM6ZE2RphXlnspVBWNKs/ybP0CFE4AjmuqNOh6cj89E6kre9WVxHonAcyLPRCKdQrZ6ViEzWOnLTJPVyzEe4Uwv8zZVFNqi3TMT9rOtHHIxr6MkCunxzcUmb5j48iFhiDbnGs0JfmeeHSnAH89e7QKKd8GAKQsI7r2lZIaamNqJWKwBw5x9kFJvlTZxkHRRDYdaUETekkDXXOCoQssbXEGWHrRkHvEJq1DkM+XRbex5j4XC//YYrHiAZrCpx5kJc9lpl6DazXdkMfeANX49w2mHEDazt/PKWHgmmRDA+yHq385edao6HlBSNEKb5JoSzySK8Fv0lELBU6Y53GjZB5/xkRCTfU1FLDJzq6XHPKdOvTsQEbiNtsWX+v0HUiCJG5wlHHUElajNRx1tZxz93uafs3TegvTcNeZPRd2XC5NTPqe2Emp1ZBWKjvVT+/XyF7lztrAXsxtZoMpesxZ536X4/4xypRybBLn+YJapU641lJE10pj4KtuTLCLc7Dsycw06Gl5siyYoxy+ZPwSz+fT8TuC6YfdbeNwswLKaJ/sDWAYY6X1Wvp9A5RG9jCBJzfF4XAU+AyjIitEn6mWcE/pZty1TDrfxRemnP1+YW694FCk/qNfwtQMYxH79A1ciGFZZaZ0Lzh6+HOuF8bxVw8uMQIbRRVVh1Ri/RjagzhJJi4a0HeLAmASRH0RyZ7oT+R23uTFUlVDcq9LGLEsLrKTwPjRHYnQ2jamETY0UAMSNDrmHJdURzt+VufrjK1ikYzFhSY3R+PLPTEzBv0qqA0MIvlT3uudm6guOE53RtUllW86jounFk9sCdCYRmOdvZ1sdGwvkKw/aJgaC6ei3jRd3u9n6VnZT/cXdrFhT5w66iKdEqTDNKk3bwFt5i3vxSBWcTCDTcBX8sOBF+PnHLhe2Q+V84fySgzv3EnoiVmJ wYjIWFAI utnnRtpPYqLKoT4QvzYPnUyU2I5Rvzlw3UNh8GRis9qoWa/NroOeVIHec0neE8eSbyIi49KK5Dq8iVyowaPZMKxWFmDdgPKlEzbmfJU/VlBE7pLp0Y09WyNOgxdNs1huCAjLvu72fXuzk7afrZx4EdwL1MQxY8rGUIbExYOv1yyE17MwGgFQdVYmiRNMSrR7TpNjJJrW5cwPbg0QTXNJw3WeBrBEXT2MoGcWB5VqV9FwyAeVmqVRojYC8qGkfeneIqnBjSHxCN356xu+dYC9xWSymATxyEatckzXHQR+Dy2RSf34+oJ+H0md8t/qOFMGmBQOLySG/3GPeMYYX+vWF4CS/VjG3l1wMjI7JIB76NQf7IIdiuM9x/op62caogZoyk2NWhvMbt4R8SfImkzb0dd9wljgRbKRYyEE0gKU6jQxrmolYPPQO80C384tLY1jsf6uqSQrdj/F+ORVffSnebmsI+cc0u6yIJm/OcrBlNrd+eLHY/kStSxzMi1Fm9yJM4oRSTsu2nXuLe0C7zsilpUFil8tVgjrC9APjI2Xd0biiEeSzYizcDo24GE0C+p7lhckc96+Kyh6OVIfGVEbkqMmOcr5KUlCbVZVv3xXCBi/Bg2M2RvDa3cXS0CAsmk61IIL8Zokb2uYvKm4zASF3lvak1lhhAo3NTpOQsz1OAIu+wETBU1OrVkIL+w== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000105, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Fuad Tabba writes: This question should not block merging of this series since performance can be improved in a separate series: > > > + > +/* > + * Marks the range [start, end) as mappable by both the host and the guest. > + * Usually called when guest shares memory with the host. > + */ > +static int gmem_set_mappable(struct inode *inode, pgoff_t start, pgoff_t end) > +{ > + struct xarray *mappable_offsets = &kvm_gmem_private(inode)->mappable_offsets; > + void *xval = xa_mk_value(KVM_GMEM_ALL_MAPPABLE); > + pgoff_t i; > + int r = 0; > + > + filemap_invalidate_lock(inode->i_mapping); > + for (i = start; i < end; i++) { Were any alternative data structures considered, or does anyone have suggestions for alternatives? Doing xa_store() in a loop here will take a long time for large ranges. I looked into the following: Option 1: (preferred) Maple trees Maple tree has a nice API, though it would be better if it can combine ranges that have the same value. I will have to dig into performance, but I'm assuming that even large ranges are stored in a few nodes so this would be faster than iterating over indices in an xarray. void explore_maple_tree(void) { DEFINE_MTREE(mt); mt_init_flags(&mt, MT_FLAGS_LOCK_EXTERN | MT_FLAGS_USE_RCU); mtree_store_range(&mt, 0, 16, xa_mk_value(0x20), GFP_KERNEL); mtree_store_range(&mt, 8, 24, xa_mk_value(0x32), GFP_KERNEL); mtree_store_range(&mt, 5, 10, xa_mk_value(0x32), GFP_KERNEL); { void *entry; MA_STATE(mas, &mt, 0, 0); mas_for_each(&mas, entry, ULONG_MAX) { pr_err("[%ld, %ld]: 0x%lx\n", mas.index, mas.last, xa_to_value(entry)); } } mtree_destroy(&mt); } stdout: [0, 4]: 0x20 [5, 10]: 0x32 [11, 24]: 0x32 Option 2: Multi-index xarray The API is more complex than maple tree's, and IIUC multi-index xarrays are not generalizable to any range, so the range can't be 8 1G pages + 1 4K page for example. The size of the range has to be a power of 2 that is greater than 4K. Using multi-index xarrays would mean computing order to store multi-index entries. This can be computed from the size of the range to be added, but is an additional source of errors. Option 3: Interval tree, which is built on top of red-black trees The API is set up at a lower level. A macro is used to define interval trees, the user has to deal with nodes in the tree directly and separately define functions to override sub-ranges in larger ranges. > + r = xa_err(xa_store(mappable_offsets, i, xval, GFP_KERNEL)); > + if (r) > + break; > + } > + filemap_invalidate_unlock(inode->i_mapping); > + > + return r; > +} > + > +/* > + * Marks the range [start, end) as not mappable by the host. If the host doesn't > + * have any references to a particular folio, then that folio is marked as > + * mappable by the guest. > + * > + * However, if the host still has references to the folio, then the folio is > + * marked and not mappable by anyone. Marking it is not mappable allows it to > + * drain all references from the host, and to ensure that the hypervisor does > + * not transition the folio to private, since the host still might access it. > + * > + * Usually called when guest unshares memory with the host. > + */ > +static int gmem_clear_mappable(struct inode *inode, pgoff_t start, pgoff_t end) > +{ > + struct xarray *mappable_offsets = &kvm_gmem_private(inode)->mappable_offsets; > + void *xval_guest = xa_mk_value(KVM_GMEM_GUEST_MAPPABLE); > + void *xval_none = xa_mk_value(KVM_GMEM_NONE_MAPPABLE); > + pgoff_t i; > + int r = 0; > + > + filemap_invalidate_lock(inode->i_mapping); > + for (i = start; i < end; i++) { > + struct folio *folio; > + int refcount = 0; > + > + folio = filemap_lock_folio(inode->i_mapping, i); > + if (!IS_ERR(folio)) { > + refcount = folio_ref_count(folio); > + } else { > + r = PTR_ERR(folio); > + if (WARN_ON_ONCE(r != -ENOENT)) > + break; > + > + folio = NULL; > + } > + > + /* +1 references are expected because of filemap_lock_folio(). */ > + if (folio && refcount > folio_nr_pages(folio) + 1) { > + /* > + * Outstanding references, the folio cannot be faulted > + * in by anyone until they're dropped. > + */ > + r = xa_err(xa_store(mappable_offsets, i, xval_none, GFP_KERNEL)); > + } else { > + /* > + * No outstanding references. Transition the folio to > + * guest mappable immediately. > + */ > + r = xa_err(xa_store(mappable_offsets, i, xval_guest, GFP_KERNEL)); > + } > + > + if (folio) { > + folio_unlock(folio); > + folio_put(folio); > + } > + > + if (WARN_ON_ONCE(r)) > + break; > + } > + filemap_invalidate_unlock(inode->i_mapping); > + > + return r; > +} > + > >