From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id A6D4DC02192 for ; Thu, 6 Feb 2025 03:14:33 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 091F76B0082; Wed, 5 Feb 2025 22:14:33 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 044506B0083; Wed, 5 Feb 2025 22:14:32 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E25006B0085; Wed, 5 Feb 2025 22:14:32 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id C3CB16B0082 for ; Wed, 5 Feb 2025 22:14:32 -0500 (EST) Received: from smtpin28.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 518FC1A0E2C for ; Thu, 6 Feb 2025 03:14:32 +0000 (UTC) X-FDA: 83088051984.28.E48BD14 Received: from mail-pj1-f73.google.com (mail-pj1-f73.google.com [209.85.216.73]) by imf20.hostedemail.com (Postfix) with ESMTP id 98CB01C0003 for ; Thu, 6 Feb 2025 03:14:30 +0000 (UTC) Authentication-Results: imf20.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=uEkS+wgw; spf=pass (imf20.hostedemail.com: domain of 3FSmkZwsKCDwYaicpjcwrleemmejc.amkjglsv-kkitYai.mpe@flex--ackerleytng.bounces.google.com designates 209.85.216.73 as permitted sender) smtp.mailfrom=3FSmkZwsKCDwYaicpjcwrleemmejc.amkjglsv-kkitYai.mpe@flex--ackerleytng.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1738811670; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:dkim-signature; bh=i6OfEcg14oMUGhFkQ4MS2VHrc0kKqCPEn9U5y95RINg=; b=cGUqzawgOz0jChB1bSAGjwWb+D/s87fFk60Snr7Ls/Xv29rbd38AeNZjCYFAO4qcRLeYQb RwzoSildb5uWo6TFzA6MbX71aqjLRwXM6LRpvqPGsM73b+Da/vjuggI2KfWI5Y4bDg7LV7 WOYY/dWXRcvLwMNSeW4FPjouhGWuTgw= ARC-Authentication-Results: i=1; imf20.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=uEkS+wgw; spf=pass (imf20.hostedemail.com: domain of 3FSmkZwsKCDwYaicpjcwrleemmejc.amkjglsv-kkitYai.mpe@flex--ackerleytng.bounces.google.com designates 209.85.216.73 as permitted sender) smtp.mailfrom=3FSmkZwsKCDwYaicpjcwrleemmejc.amkjglsv-kkitYai.mpe@flex--ackerleytng.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1738811670; a=rsa-sha256; cv=none; b=2jpOozi4rJElz0NzCNAERUTcNIOb6hm4aGZREvnIGc9skgaGoWj/JTMln+KaEZXogMiYWI GPN7PnPIPA9DMRx8/sBN4icOzN2+ihW1EjZ042pmNS/TUtT7YIGNrDrpyYrRMPPyggbAgG uDc3suzCf4uWJqJkNNgHf+Dx1N7cGlA= Received: by mail-pj1-f73.google.com with SMTP id 98e67ed59e1d1-2f9d1ca50c9so1332129a91.0 for ; Wed, 05 Feb 2025 19:14:30 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1738811669; x=1739416469; darn=kvack.org; h=cc:to:from:subject:message-id:mime-version:in-reply-to:date:from:to :cc:subject:date:message-id:reply-to; bh=i6OfEcg14oMUGhFkQ4MS2VHrc0kKqCPEn9U5y95RINg=; b=uEkS+wgwcb24XMZ/rh84RcX/XK4MiER7aa49j+4EjZi0Z5N+KQ2njdu/Yh1xWFHxXs 872T7cXC7nvGtTFMTeLsdgf0ojMzSz8SrMcuJCw8/nelUcZnLn0iGuyOQ0cLK6OTHCFw qzY2GxrEyI3l++g7NPOuRgA/1RG8QwL2ks8l4+RUUq2ue7POVFbcjh75MLx7qZzzRRaA M0wjMkabegMaDK47Ey3jM6g/0XJ1w71LXNrTK7qbf+0+L6RYb75wh1zaY6vDzt1xZys7 9sxrkTyj3a8DGD+DAcECzvMAIJF74aKGRpysiZ983AYPfMrEtMiUWAKOjiX156/1+MtN JRdQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1738811669; x=1739416469; h=cc:to:from:subject:message-id:mime-version:in-reply-to:date :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=i6OfEcg14oMUGhFkQ4MS2VHrc0kKqCPEn9U5y95RINg=; b=nMkgB+AIw7qDEuTrOhv3ktYuo4clGiKI+2Scj015cX32NvERZfdVccd1SAjP53WN7q dgwRGcxV3AlDDSPI/m9f8bOnJY4OCaJ7HECxlpHwzewR64DMsuNl/VK86kkmkwPTSxfW fENWnGU0Y/K+DbVXrWqHfVMpzfREHAIzXkUKQQLcUXHbsHrq6nZ0vCwMd1nPQKVPrf1+ 5088z8RsDKiXlLsonuGhOotB/EA2VBHpI+aBtWS5PeLIgjCXILjGmEPV30zrHGHSYAkh ju+eCrskPNxAN3ZxAm8mJlRSVo6ofVhF/YabLRA706UonAB7sAPqDAX+K5bDE38rXfuU zqVg== X-Forwarded-Encrypted: i=1; AJvYcCXMjE7bHVDjk8LOK0wrmjXBCAb7KUpu6O3kbWKa+N6LCXkabk8cFGb9eeFCaht7n9yOVyuN9keFuA==@kvack.org X-Gm-Message-State: AOJu0YzYMwm/TjemHDmAONAQBMjfD9TSqzYZ0h/AnyYJKgteLFLD64ZU njy5EfLIirFwQQKosaF+egbYZayCjE8clpJ0au1GSf7ZPODpvtiuQzU2wYLCSLYWjEkrcInlk35 6niLSTHSOKnisBKE8KAmuwQ== X-Google-Smtp-Source: AGHT+IEhjV8IoXhC7L0su0QVYZRadbNJ7mcE6NaQ3JKVlwcESeH3KZDS5WbZ0UWdb8GJZIGz7LQzpNil+0avQSZH8g== X-Received: from pjbsu5.prod.google.com ([2002:a17:90b:5345:b0:2ef:71b9:f22f]) (user=ackerleytng job=prod-delivery.src-stubby-dispatcher) by 2002:a17:90b:3a88:b0:2ea:5054:6c49 with SMTP id 98e67ed59e1d1-2f9e069bc55mr10139814a91.0.1738811669291; Wed, 05 Feb 2025 19:14:29 -0800 (PST) Date: Thu, 06 Feb 2025 03:14:28 +0000 In-Reply-To: (message from Fuad Tabba on Mon, 20 Jan 2025 10:40:49 +0000) Mime-Version: 1.0 Message-ID: Subject: Re: [RFC PATCH v5 05/15] KVM: guest_memfd: Folio mappability states and functions that manage their transition From: Ackerley Tng To: Fuad Tabba Cc: kirill@shutemov.name, kvm@vger.kernel.org, linux-arm-msm@vger.kernel.org, linux-mm@kvack.org, pbonzini@redhat.com, chenhuacai@kernel.org, mpe@ellerman.id.au, anup@brainfault.org, paul.walmsley@sifive.com, palmer@dabbelt.com, aou@eecs.berkeley.edu, seanjc@google.com, viro@zeniv.linux.org.uk, brauner@kernel.org, willy@infradead.org, akpm@linux-foundation.org, xiaoyao.li@intel.com, yilun.xu@intel.com, chao.p.peng@linux.intel.com, jarkko@kernel.org, amoorthy@google.com, dmatlack@google.com, yu.c.zhang@linux.intel.com, isaku.yamahata@intel.com, mic@digikod.net, vbabka@suse.cz, vannapurve@google.com, mail@maciej.szmigiero.name, david@redhat.com, michael.roth@amd.com, wei.w.wang@intel.com, liam.merwick@oracle.com, isaku.yamahata@gmail.com, kirill.shutemov@linux.intel.com, suzuki.poulose@arm.com, steven.price@arm.com, quic_eberman@quicinc.com, quic_mnalajal@quicinc.com, quic_tsoni@quicinc.com, quic_svaddagi@quicinc.com, quic_cvanscha@quicinc.com, quic_pderrin@quicinc.com, quic_pheragu@quicinc.com, catalin.marinas@arm.com, james.morse@arm.com, yuzenghui@huawei.com, oliver.upton@linux.dev, maz@kernel.org, will@kernel.org, qperret@google.com, keirf@google.com, roypat@amazon.co.uk, shuah@kernel.org, hch@infradead.org, jgg@nvidia.com, rientjes@google.com, jhubbard@nvidia.com, fvdl@google.com, hughd@google.com, jthoughton@google.com Content-Type: text/plain; charset="UTF-8" X-Rspamd-Queue-Id: 98CB01C0003 X-Rspam-User: X-Rspamd-Server: rspam11 X-Stat-Signature: fo319aj14q4mdjgukwttpbxft1h6a3ef X-HE-Tag: 1738811670-679292 X-HE-Meta: U2FsdGVkX18wH9prNgdszXTDCqqSwGwhhEujv7kKaBHGeYx2pVV4ta7QXryp+3AjAhepLmxY5kT2gLSx3yUjZh8JswpA8HID3/DQxwSmGHshsJr2N0a1aKegSTfbJbyyoLjoLrT+K64uI3ZdSlBBB5jamoyrLIg5tie7kT5nO3wjpc8WG5TjYVMSR5Bseheq5Kls0/0C/uqjZ9FP7gIwNBZgwlxgmDC55Gx/4IydpfWZQr8265Ol/RU5KwB2cIhze4pyfnkraBPsIuyJXNzDzMbrzfAxFph3bQote4VwBvvgvrp/YsKGgkybhWknXarMEQneYM0ERGtyMGxJDSNi+nM8gDltuzwWjIz/6kh4ifeUH9rzNNc29GLBFsPhTigCy5frKeT+dpoQNuea699X/0Bvw4WTze4HaUeMg6HpjzK5y4KVZQ9+CpORF9W8z1vog1Oc1r27xi0ymca55J3zByXKvitqY81KNlY0rBaKIB+deg+MZkDlhmbKDoG5xnMH9sUyV5SV2vJVcy1kg0VwGHBnwOErPFkr4XVqaANRtq3Gny/L3TmDdBi9gwM3pNRZ9IFf+gPkafUBi5tcTxsLa1I4YdpzZjYZ7MpCqAZWocdGSYXxIVbWePfVRj+86cBCT9WLwYeqCeLHdnpkUu5iIHpqw8+Our8shXbkkJpFaV2SU0NMa96B9SC2Kr54L3qmfEeppPxNuBNQP1wVsfc52Jo1oXEXUIhUm+Wwv4tMipXg/PAnlwjUKGk7r4sLIC0Ug6ONVIeJcm3I24q+3wS7M3u+JJXy7hN9PkN6gpFa62Lnks0jcbR5yeI6xkaafi7nPxJQmVU/YrqnsLvkemAagMvPE6uoncHtcdKLWYUFkC6xO38KQdHQOwW3DwQaXLKJ/IAEh3yZDrhFf5wqx+w3U5aAxYQTlBM3MRht05w1uJo/+9v3XZXZq+zX6qX8f89jyzIYLQM8Frus29wv5mW ue07ISoN r5GY7zO0h7wW9SEALQoGDvABMj1rQWwsBUlMjfruQwItri1+lwq8LxMPSxtJhzINWDgPUj7uDsSUzGVo4ZMvBlWQ8H2SwqHn2p++pvzRW6SDzOkoZn1WpSTBgxU0VrQ96EWJH2JhyZe8p3iDXUiGQYoJK/S0+XQKgVRyup291JY/+JQMsINsV7ibiecCFD70nae0cZwQnSIOhqGzOP2xeFZArM+8jJ2C/NVWBpytp5/nw3Ng1bep2e9S07yBZV1YxHn1q4Z6MsU+WdiQbTi40IhwBWG5lUnB66tmyu35ueErrbotZSYRnasaisELakiBLn9n/XDMaWqohdSOOe7VmawDpskc/XFOPO3NKEf6BZ7pk38v7YTXRMNS/FXjNRHtWIhbvhH2og8zx5kltAubABabPg923waSEZ56MMeBCxRkCirmlY7yoh4Dm4gvAXXu4tTUIVxWzcQtsXI63FxUTMXcDoASCkOybRdIobM4hPNZbs41OTKFkEIg2o35J3HaGPQFeJpSq/HIoFdfnfGqWtG+8udRikf+0AfcQc7jxfpd0Ldo2DuI1wLQFLoiby7BondxEZZdICR0Yq0TfXxIJGJrh2fNMTgaVSQLU2BYnXnp4fMxrVkymbq2wtT8lVHa7tEK0WfNvOPeyys1Nu7dryE7ABVNhqsrtEzadBRgoGb6Mrx2wkEVBPs0Xmw== X-Bogosity: Ham, tests=bogofilter, spamicity=0.020589, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Fuad Tabba writes: > On Mon, 20 Jan 2025 at 10:30, Kirill A. Shutemov wrote: >> >> On Fri, Jan 17, 2025 at 04:29:51PM +0000, Fuad Tabba wrote: >> > +/* >> > + * Marks the range [start, end) as not mappable by the host. If the host doesn't >> > + * have any references to a particular folio, then that folio is marked as >> > + * mappable by the guest. >> > + * >> > + * However, if the host still has references to the folio, then the folio is >> > + * marked and not mappable by anyone. Marking it is not mappable allows it to >> > + * drain all references from the host, and to ensure that the hypervisor does >> > + * not transition the folio to private, since the host still might access it. >> > + * >> > + * Usually called when guest unshares memory with the host. >> > + */ >> > +static int gmem_clear_mappable(struct inode *inode, pgoff_t start, pgoff_t end) >> > +{ >> > + struct xarray *mappable_offsets = &kvm_gmem_private(inode)->mappable_offsets; >> > + void *xval_guest = xa_mk_value(KVM_GMEM_GUEST_MAPPABLE); >> > + void *xval_none = xa_mk_value(KVM_GMEM_NONE_MAPPABLE); >> > + pgoff_t i; >> > + int r = 0; >> > + >> > + filemap_invalidate_lock(inode->i_mapping); >> > + for (i = start; i < end; i++) { >> > + struct folio *folio; >> > + int refcount = 0; >> > + >> > + folio = filemap_lock_folio(inode->i_mapping, i); >> > + if (!IS_ERR(folio)) { >> > + refcount = folio_ref_count(folio); >> > + } else { >> > + r = PTR_ERR(folio); >> > + if (WARN_ON_ONCE(r != -ENOENT)) >> > + break; >> > + >> > + folio = NULL; >> > + } >> > + >> > + /* +1 references are expected because of filemap_lock_folio(). */ >> > + if (folio && refcount > folio_nr_pages(folio) + 1) { >> >> Looks racy. >> >> What prevent anybody from obtaining a reference just after check? >> >> Lock on folio doesn't stop random filemap_get_entry() from elevating the >> refcount. >> >> folio_ref_freeze() might be required. > > I thought the folio lock would be sufficient, but you're right, > nothing prevents getting a reference after the check. I'll use a > folio_ref_freeze() when I respin. > > Thanks, > /fuad > Is it correct to say that the only non-racy check for refcounts is a check for refcount == 0? What do you think of this instead: If there exists a folio, don't check the refcount, just set mappability to NONE and register the callback (the folio should already have been unmapped, which leaves folio->page_type available for use), and then drop the filemap's refcounts. When the filemap's refcounts are dropped, in most cases (no transient refcounts) the callback will be hit and the callback can set mappability to GUEST. If there are transient refcounts, the folio will just be waiting for the refcounts to drop to 0, and that's when the callback will be hit and the mappability can be transitioned to GUEST. If there isn't a folio, then guest_memfd was requested to set mappability ahead of any folio allocation, and in that case transitioning to GUEST immediately is correct. >> > + /* >> > + * Outstanding references, the folio cannot be faulted >> > + * in by anyone until they're dropped. >> > + */ >> > + r = xa_err(xa_store(mappable_offsets, i, xval_none, GFP_KERNEL)); >> > + } else { >> > + /* >> > + * No outstanding references. Transition the folio to >> > + * guest mappable immediately. >> > + */ >> > + r = xa_err(xa_store(mappable_offsets, i, xval_guest, GFP_KERNEL)); >> > + } >> > + >> > + if (folio) { >> > + folio_unlock(folio); >> > + folio_put(folio); >> > + } >> > + >> > + if (WARN_ON_ONCE(r)) >> > + break; >> > + } >> > + filemap_invalidate_unlock(inode->i_mapping); >> > + >> > + return r; >> > +} >> >> -- >> Kiryl Shutsemau / Kirill A. Shutemov