From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9C3A3C02196 for ; Thu, 6 Feb 2025 09:46:18 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 2F0BF6B0085; Thu, 6 Feb 2025 04:46:18 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 29EB86B0088; Thu, 6 Feb 2025 04:46:18 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 166AF280003; Thu, 6 Feb 2025 04:46:18 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id EB97E6B0085 for ; Thu, 6 Feb 2025 04:46:17 -0500 (EST) Received: from smtpin26.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 7BC5F161008 for ; Thu, 6 Feb 2025 09:46:17 +0000 (UTC) X-FDA: 83089039194.26.9899745 Received: from mail-qt1-f176.google.com (mail-qt1-f176.google.com [209.85.160.176]) by imf09.hostedemail.com (Postfix) with ESMTP id 9E0BD140051 for ; Thu, 6 Feb 2025 09:46:15 +0000 (UTC) Authentication-Results: imf09.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=wGciowEC; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf09.hostedemail.com: domain of tabba@google.com designates 209.85.160.176 as permitted sender) smtp.mailfrom=tabba@google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1738835175; a=rsa-sha256; cv=none; b=vFTFvnPyLX915aAXJ6Zef+kWy/mJCBDo6U1BVeCooItNg4LrP7wEe8fbfkUz6rWKcZS46u YwII+jovl55OuKNeDiV/GNmPwVqU53d7sJuv84mbf3w7ca9EOoDesYE87kK02IMQfRfJuP FwmCa0aiFhpww/WucFhQC7N0DgKUg6c= ARC-Authentication-Results: i=1; imf09.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=wGciowEC; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf09.hostedemail.com: domain of tabba@google.com designates 209.85.160.176 as permitted sender) smtp.mailfrom=tabba@google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1738835175; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=7BKAt0NEQXysy6sKnPuYIWKvPg1dRJqyEHJ9MeAxGgY=; b=OuDRUJNtYVX5NqrXKLHuiADrSRMDTj/u8m9DIn+bLqaBO/IdeOJLzzy5Fwv1oAmeXUNf59 thoB1luapeO7xPynA1mtST3/S//QQ33ke1C5qjO7HSEhGK+gKetT1xNTaZRzdzn4p8f+lS JOMkB58nuJdOlARSihJITEMyg/eVor8= Received: by mail-qt1-f176.google.com with SMTP id d75a77b69052e-467abce2ef9so202671cf.0 for ; Thu, 06 Feb 2025 01:46:15 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1738835175; x=1739439975; darn=kvack.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=7BKAt0NEQXysy6sKnPuYIWKvPg1dRJqyEHJ9MeAxGgY=; b=wGciowECZ7X4+WpVXepxEIKpjfzYDSN51hyA8lR/EuBu+DcNp6NtEMdfJzBukFuEA1 8A40pf33jX81X03Qiit2IjS/9SfJ95dmheO8CVHq8dwcS0BsiogC1ajHjXFhmWtf1fei uyRScwntDURSec8qjQSvuq2DrozutFkRiN+ghU1wkLdrfXD5sLnCWKZMSNAaFpuWB1PN Jf1d5yvYBxvLHTjvyETdsBcLKgwOEcIGQMDbdjV4hyfdMnkmIHYjRrcL4ehSsaseAfTD Y9JoAWuX9EgLhAJCHkd5UgcVrqMgh2XB32NJ8HhH1KZ6TH3ZSjnh2qPFUYahMYaENUqo NP5w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1738835175; x=1739439975; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=7BKAt0NEQXysy6sKnPuYIWKvPg1dRJqyEHJ9MeAxGgY=; b=XjFg97p0fwbJkXYCFl0qI0uH7YXaADzAdfljUZhH6+6nIBunhgth5TY0BVo5sbZO67 5an5So2IIRg1I7RyNGVyFhaJkDrzsWdPY0Xr9ZitVWQhATgT/FjzAUhzonr4qDEpb2br Rmv1AYgqPo0ODx1d22yn0+I6K3Pz1L0a3jZKUCSe4aDbYwJ4UiZ5KgyU5u9YdjJHuSpH BlVFvbaFwCOI4nGHFH8bOntW6i441W1CUNFPZ7CZeuxh541mi9vYABcjC+uoYBgS2OWl kzmc7hVeFUX5FivisqZY59znbxH/IqOvhEHi0WEmVx3ujB8qclhYu9mdGlgbsrxUfCQr Q24A== X-Forwarded-Encrypted: i=1; AJvYcCUW1qoBUiiGt5YHuZ0rPZe6Ofw9g/HJkc1zcwCTYHm2UT6MibeGLF+7LFhgs1TisivcicTlsJrE+w==@kvack.org X-Gm-Message-State: AOJu0YzvYvNJ9EIMKhmUXQ/0CRvIRgdve3VrS+lFkDibMT0kxZEOQFsG NLhtdfeP/DopkLSPdTPCReL3zW/RyvBRZ3T6ZQ6FMbBO41bwd+uF0vL2lFT4mzz36epIiZejJfl Fofk8pSbgWtWYTRVNdGDrhIhyptUg2UaQX1bXSUWtb0N9UO0NTzAV X-Gm-Gg: ASbGncs9k1v/FIpPNnI30Fu+i2EsMnAqIn6aKpkeG4ojVtatHxqWAincpHAFogIQ0PE Nb/v7/g/zewuLgiuJkOP1id6aLP4J/uNOEuvTkHAhD6lmPSMhwUhlIOm+MaXiryv5M2Ice6dHnS 3C3sy1dTNWUPYu1PYtIq4ZtZpb+A== X-Google-Smtp-Source: AGHT+IEuDo660q8ak3VegRH5ye6TRiM/Y+Jo49jMGjPSMpFsCFYfKqUL6UwwSsHkBvoSSSe2L3a7QsltEoQK936WK0Q= X-Received: by 2002:a05:622a:350:b0:46c:791f:bf46 with SMTP id d75a77b69052e-471130d4434mr1694721cf.19.1738835174541; Thu, 06 Feb 2025 01:46:14 -0800 (PST) MIME-Version: 1.0 References: In-Reply-To: From: Fuad Tabba Date: Thu, 6 Feb 2025 09:45:37 +0000 X-Gm-Features: AWEUYZnkYtrz5pt-vUwH-yMpFx5wLZBryRWV4TMPGtW_CAGutG2Hm9Bf6oCnBOc Message-ID: Subject: Re: [RFC PATCH v5 05/15] KVM: guest_memfd: Folio mappability states and functions that manage their transition To: Ackerley Tng Cc: kirill@shutemov.name, kvm@vger.kernel.org, linux-arm-msm@vger.kernel.org, linux-mm@kvack.org, pbonzini@redhat.com, chenhuacai@kernel.org, mpe@ellerman.id.au, anup@brainfault.org, paul.walmsley@sifive.com, palmer@dabbelt.com, aou@eecs.berkeley.edu, seanjc@google.com, viro@zeniv.linux.org.uk, brauner@kernel.org, willy@infradead.org, akpm@linux-foundation.org, xiaoyao.li@intel.com, yilun.xu@intel.com, chao.p.peng@linux.intel.com, jarkko@kernel.org, amoorthy@google.com, dmatlack@google.com, yu.c.zhang@linux.intel.com, isaku.yamahata@intel.com, mic@digikod.net, vbabka@suse.cz, vannapurve@google.com, mail@maciej.szmigiero.name, david@redhat.com, michael.roth@amd.com, wei.w.wang@intel.com, liam.merwick@oracle.com, isaku.yamahata@gmail.com, kirill.shutemov@linux.intel.com, suzuki.poulose@arm.com, steven.price@arm.com, quic_eberman@quicinc.com, quic_mnalajal@quicinc.com, quic_tsoni@quicinc.com, quic_svaddagi@quicinc.com, quic_cvanscha@quicinc.com, quic_pderrin@quicinc.com, quic_pheragu@quicinc.com, catalin.marinas@arm.com, james.morse@arm.com, yuzenghui@huawei.com, oliver.upton@linux.dev, maz@kernel.org, will@kernel.org, qperret@google.com, keirf@google.com, roypat@amazon.co.uk, shuah@kernel.org, hch@infradead.org, jgg@nvidia.com, rientjes@google.com, jhubbard@nvidia.com, fvdl@google.com, hughd@google.com, jthoughton@google.com Content-Type: text/plain; charset="UTF-8" X-Rspam-User: X-Rspamd-Server: rspam01 X-Rspamd-Queue-Id: 9E0BD140051 X-Stat-Signature: 3os8zfbjio1k8heye13d9z9k8cfn8gpy X-HE-Tag: 1738835175-368424 X-HE-Meta: U2FsdGVkX1/6vNJozJf05miAKA0q6dLFJHoHsSUebUXwxWimJgKXZaxkTDF0w20qjelEfJU0aUQbEZmkgwhDbpF513CVOw2QCHcakkmqF9VreU9V12N+khozVG/Fcng9ac/JZfqP7Lc/qJe/eFbeYREDgC0KhjTr0tyajX0sxhKJmxyngDLH+JuFKDgCx3s6osR2NZzKKJ9OlvMJpgNmZubr8ljQE5xD801DuQSwxHLVDc5li2Xl116py3F3SIQo1YT7pARntE86liXMjlJ0384mmTeq/qrvU6Z6JsWbVfku6JT7u/tsO5trHS3mfyVBgtzuwZ0n2PyHz05Fk1nYgqNsyecMgal38NMwDjyBiS+CEqbbqYHaGAYtQsauw7/feqFx3L7SToOjQQfa424TUT4Iyl5i0csWz/++VoTNvJ41hWc3fu09rIEfT0x0SSpptx8g3zgGur5X3fY/gVAPRyem9x545iuuLhkWHx+t2Rvv0eMKZDRnBQQkWh7F0Bmz8VO8eQGPy+ATCicFYJRclP/c/C8k6s/MlabI+O4LIGHtyP9V1jtma0SSZPVAP4Ry22w/u+fRrukf4tXEzs4siZWXUjH72WlkUWmzijnCMh9tMTBKOWpE0NK2Zxm4CRvHIkwXOJIhJd3o7nfnlydfN8aE293hYNnbt3BzbW+XI5m8mc05Tvu4iQ1ThfD7VLwVXr2A8okVkgD6dLrWA8VUYiT/7Txjr7hQ9FW/ZNEE5XUTsB6vFvbaXoCjpcpqa83edWQPf7sAYr7MTXS2gwrlcR6r8EY+VxDKeM/DFBSk1wfRz28OMozvdVCmKJ4BmJ0FHBFgIuHhvBrwv+vVcNZ69/4NwF9AWyXp1rLogEzXYxWrCnbPUmDWyYJJXJpijip6iIgd8xrGzfw3OUYpt3eWDPvhvG5S2mA2UxZstyOyhANMM4hDrxfCJF0zfiP66Q260M1/1Vxk9zsTv+goRd1 5QrzlUgv U27gI26mQ1+fwcqZezXDXXrw4Ulp3lhzd+ckJh2MFIIhYhLRJqihJBcwx/oA2EQ+3q1QAD4YD4GJkysPeMI5bSkSfQUQisKrsUQ94UCG+lNQUTCIeEOVaQIpqeKr8yuYxMTCDXfn0RYOehNdSI/Am0eEhluhQtDQDVRmWU19hcxP0XDYZYOhp3VqmcX3Lw/AIpdp8bEziGwjN/FPeVTxS0YMhZwL3qraOiSDYDo3x886zJWmWDaQKHuTpUfK+t/oU/g27IF6hqwXvka8zoPrwnsi/ZIq6NHZ+l7/D1lPJTfVOzwvG8mY1UKGEu/3b9VsWMMugZSbJ+Xb1scvLx69Dti+IEA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000324, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Hi Ackerley, On Thu, 6 Feb 2025 at 03:14, Ackerley Tng wrote: > > Fuad Tabba writes: > > > On Mon, 20 Jan 2025 at 10:30, Kirill A. Shutemov wrote: > >> > >> On Fri, Jan 17, 2025 at 04:29:51PM +0000, Fuad Tabba wrote: > >> > +/* > >> > + * Marks the range [start, end) as not mappable by the host. If the host doesn't > >> > + * have any references to a particular folio, then that folio is marked as > >> > + * mappable by the guest. > >> > + * > >> > + * However, if the host still has references to the folio, then the folio is > >> > + * marked and not mappable by anyone. Marking it is not mappable allows it to > >> > + * drain all references from the host, and to ensure that the hypervisor does > >> > + * not transition the folio to private, since the host still might access it. > >> > + * > >> > + * Usually called when guest unshares memory with the host. > >> > + */ > >> > +static int gmem_clear_mappable(struct inode *inode, pgoff_t start, pgoff_t end) > >> > +{ > >> > + struct xarray *mappable_offsets = &kvm_gmem_private(inode)->mappable_offsets; > >> > + void *xval_guest = xa_mk_value(KVM_GMEM_GUEST_MAPPABLE); > >> > + void *xval_none = xa_mk_value(KVM_GMEM_NONE_MAPPABLE); > >> > + pgoff_t i; > >> > + int r = 0; > >> > + > >> > + filemap_invalidate_lock(inode->i_mapping); > >> > + for (i = start; i < end; i++) { > >> > + struct folio *folio; > >> > + int refcount = 0; > >> > + > >> > + folio = filemap_lock_folio(inode->i_mapping, i); > >> > + if (!IS_ERR(folio)) { > >> > + refcount = folio_ref_count(folio); > >> > + } else { > >> > + r = PTR_ERR(folio); > >> > + if (WARN_ON_ONCE(r != -ENOENT)) > >> > + break; > >> > + > >> > + folio = NULL; > >> > + } > >> > + > >> > + /* +1 references are expected because of filemap_lock_folio(). */ > >> > + if (folio && refcount > folio_nr_pages(folio) + 1) { > >> > >> Looks racy. > >> > >> What prevent anybody from obtaining a reference just after check? > >> > >> Lock on folio doesn't stop random filemap_get_entry() from elevating the > >> refcount. > >> > >> folio_ref_freeze() might be required. > > > > I thought the folio lock would be sufficient, but you're right, > > nothing prevents getting a reference after the check. I'll use a > > folio_ref_freeze() when I respin. > > > > Thanks, > > /fuad > > > > Is it correct to say that the only non-racy check for refcounts is a > check for refcount == 0? > > What do you think of this instead: If there exists a folio, don't check > the refcount, just set mappability to NONE and register the callback > (the folio should already have been unmapped, which leaves > folio->page_type available for use), and then drop the filemap's > refcounts. When the filemap's refcounts are dropped, in most cases (no > transient refcounts) the callback will be hit and the callback can set > mappability to GUEST. > > If there are transient refcounts, the folio will just be waiting > for the refcounts to drop to 0, and that's when the callback will be hit > and the mappability can be transitioned to GUEST. > > If there isn't a folio, then guest_memfd was requested to set > mappability ahead of any folio allocation, and in that case > transitioning to GUEST immediately is correct. This seems to me to add additional complexity to the common case that isn't needed for correctness, and would make things more difficult to reason about. If we know that there aren't any mappings at the host (mapcount == 0), and we know that the refcount has at one point reached 0 after we have taken the folio lock, even if the refcount gets (transiently) elevated, we know that no one at the host is accessing the folio itself. Keep in mind that the common case (in a well behaved system) is that neither the mapcount nor the refcount are elevated, and both for performance, and for understanding, I think that's what we should be targeting. Unless of course I'm wrong, and there's a correctness issue here. Cheers, /fuad > >> > + /* > >> > + * Outstanding references, the folio cannot be faulted > >> > + * in by anyone until they're dropped. > >> > + */ > >> > + r = xa_err(xa_store(mappable_offsets, i, xval_none, GFP_KERNEL)); > >> > + } else { > >> > + /* > >> > + * No outstanding references. Transition the folio to > >> > + * guest mappable immediately. > >> > + */ > >> > + r = xa_err(xa_store(mappable_offsets, i, xval_guest, GFP_KERNEL)); > >> > + } > >> > + > >> > + if (folio) { > >> > + folio_unlock(folio); > >> > + folio_put(folio); > >> > + } > >> > + > >> > + if (WARN_ON_ONCE(r)) > >> > + break; > >> > + } > >> > + filemap_invalidate_unlock(inode->i_mapping); > >> > + > >> > + return r; > >> > +} > >> > >> -- > >> Kiryl Shutsemau / Kirill A. Shutemov