From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 81B52C0218A for ; Thu, 30 Jan 2025 14:24:31 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 1B3BE44014D; Thu, 30 Jan 2025 09:24:31 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 1651628004D; Thu, 30 Jan 2025 09:24:31 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 02CC744014D; Thu, 30 Jan 2025 09:24:30 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id D79ED28004D for ; Thu, 30 Jan 2025 09:24:30 -0500 (EST) Received: from smtpin04.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 7523A47D33 for ; Thu, 30 Jan 2025 14:24:30 +0000 (UTC) X-FDA: 83064338700.04.FDE6DF4 Received: from mail-qt1-f173.google.com (mail-qt1-f173.google.com [209.85.160.173]) by imf07.hostedemail.com (Postfix) with ESMTP id A9CD540011 for ; Thu, 30 Jan 2025 14:24:28 +0000 (UTC) Authentication-Results: imf07.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=AiMiDDyL; spf=pass (imf07.hostedemail.com: domain of tabba@google.com designates 209.85.160.173 as permitted sender) smtp.mailfrom=tabba@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1738247068; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=lFuJbZiTQXk3Z1XPQsFgKQ9iLshj9zg6jIIn1xLK2qc=; b=VWfPaDX8ly1nvGA6/rTS7O3Tl/nLBUj2NpCy1LyxHgRS0g+f6S2A12ysNoHFJBRQXXT1uy sMDlTUqU3lEdeKT+5GFP5dxOHZT3jai48++4HSoXrTSIYK+WX64rpz+pulghgz27ad8Txi osmcsIV2s4VwZTAQ0NgqF4xz7yROBFk= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1738247068; a=rsa-sha256; cv=none; b=d8deODPgaxFZAkBukSp59CkDDWvlxazrn++oVUykj5f7Y0n8K/w4BgMQkPJZRJA2+iZNh0 nAfi1fcXghE+USLfTHo1DQ59Q19+wXpK4ZP817uWIg6TtY8uZjfMnP1nHAmZIjW67vxkAL zf8BhmVvS/wjx38jqGWy2X0qGhngUuE= ARC-Authentication-Results: i=1; imf07.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=AiMiDDyL; spf=pass (imf07.hostedemail.com: domain of tabba@google.com designates 209.85.160.173 as permitted sender) smtp.mailfrom=tabba@google.com; dmarc=pass (policy=reject) header.from=google.com Received: by mail-qt1-f173.google.com with SMTP id d75a77b69052e-467abce2ef9so236251cf.0 for ; Thu, 30 Jan 2025 06:24:28 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1738247068; x=1738851868; darn=kvack.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=lFuJbZiTQXk3Z1XPQsFgKQ9iLshj9zg6jIIn1xLK2qc=; b=AiMiDDyLz3AaX9CC4Me0C8qKJqcgiMIl1vbPywBOSrXjcKODeQ/5NGOsZDIF5b+nl8 KPCtDEGpWnWxUr0j4wuO+35dGjzWkjaBRoSTIuifBloSUUoCIEQ4MTko/6Xcgh+Pc0jz slbe6wsWFw1EzqdYF3jF3ggPlDmPz3/iCTAI5d3m3c4UxUZhxvfqFBKHe8cH+RnXxu8v rt6OaBUWlziVyPALLFC3qRG631lHKLQbUEysza5qrIDClDCPz5RpK5uw4aMhwskMelVm IapQaHOMFDpx5DgLTWWOnpWMq0HG4WeuL5qfnbQRtMxyi/nmDBUkSRwjn+GqSCWUcAVS ehlA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1738247068; x=1738851868; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=lFuJbZiTQXk3Z1XPQsFgKQ9iLshj9zg6jIIn1xLK2qc=; b=l4AxDY9JAV17J1yDyq6R+G++QbtDGncs+rwnketvC1ILvTeX+Ya6RRlT2egbacJ2tR xCFaQhc6OkwYqDGI7OJztCjsLM1pk9hjWD/nIOXu5CRpFDR+tNYYBe2nhCbVKx9KFRQG Q3pJESR6VEnL8M68+fgBOdSzcXctueG/+Sreci5zVWkUC6tC2Fz7Qg57I/OApeazNN/g QQuVV+fWcUOc5sHm5ibTYqMPxPAlMs7xafVyW5PDlOd6Pm+6dzi7Y17tU9vWXXw7fYym cQkFbF/m9URX9ekmfX6zDWFZ45XGfSR8PNaibxjGsTQkIG9ljGINmS3JrjzpWObULQv5 34jw== X-Forwarded-Encrypted: i=1; AJvYcCU53MtneF4hJER3/Yz/JpXftvjSOUfCOwqsaSp5tXpuJ8E6FQDtugAbcXUPncJ5Bho1bbGPmW48vA==@kvack.org X-Gm-Message-State: AOJu0YxFHnzSaUgPofN1pWJsJilP526iERzetT1RQIt1fu9+r0gQAU0c SFwhnPnjATRcMdS4XYxxRaBkJTmzemCPDzUOlm0+7EfLDBhgBsRJjBKCsyPDrG+mYr7sJJfSrjo hkS2toqzFS8v56H6SuXRkANpWhEipS2aS72B2 X-Gm-Gg: ASbGncsq1Gz+DG8XGzRSoSke9O5CWlSr+UGaYlP5d3iMTX3oqjfMH/0SfX7BvkXPHlj /FIxekH4yHES4NUjiunQ5AUtDaTpCwutgoXonS6dLa+uVYsptZIc18m9MJEmQAjQqHdMlllLstc h+vJCy9Ge2/XQ5ZgIK/idHAhM9 X-Google-Smtp-Source: AGHT+IE/Ql3C7G878f8upYkFw0LvJ/RXPD8cDOYa1Sn4OttpavWcQAXouw8LV0l9x4PGB3UAEK73GOm53+8mg9ag3aQ= X-Received: by 2002:a05:622a:420c:b0:447:e59b:54eb with SMTP id d75a77b69052e-46fde4b14efmr3494451cf.26.1738247067542; Thu, 30 Jan 2025 06:24:27 -0800 (PST) MIME-Version: 1.0 References: In-Reply-To: From: Fuad Tabba Date: Thu, 30 Jan 2025 14:23:50 +0000 X-Gm-Features: AWEUYZknmNOaLyQ73iST8WH591zVF9JPiXujRejW7T2jbpOjiO3gHEXquwThsNI Message-ID: Subject: Re: [RFC PATCH v5 06/15] KVM: guest_memfd: Handle final folio_put() of guestmem pages To: Ackerley Tng Cc: vbabka@suse.cz, kvm@vger.kernel.org, linux-arm-msm@vger.kernel.org, linux-mm@kvack.org, pbonzini@redhat.com, chenhuacai@kernel.org, mpe@ellerman.id.au, anup@brainfault.org, paul.walmsley@sifive.com, palmer@dabbelt.com, aou@eecs.berkeley.edu, seanjc@google.com, viro@zeniv.linux.org.uk, brauner@kernel.org, willy@infradead.org, akpm@linux-foundation.org, xiaoyao.li@intel.com, yilun.xu@intel.com, chao.p.peng@linux.intel.com, jarkko@kernel.org, amoorthy@google.com, dmatlack@google.com, yu.c.zhang@linux.intel.com, isaku.yamahata@intel.com, mic@digikod.net, vannapurve@google.com, mail@maciej.szmigiero.name, david@redhat.com, michael.roth@amd.com, wei.w.wang@intel.com, liam.merwick@oracle.com, isaku.yamahata@gmail.com, kirill.shutemov@linux.intel.com, suzuki.poulose@arm.com, steven.price@arm.com, quic_eberman@quicinc.com, quic_mnalajal@quicinc.com, quic_tsoni@quicinc.com, quic_svaddagi@quicinc.com, quic_cvanscha@quicinc.com, quic_pderrin@quicinc.com, quic_pheragu@quicinc.com, catalin.marinas@arm.com, james.morse@arm.com, yuzenghui@huawei.com, oliver.upton@linux.dev, maz@kernel.org, will@kernel.org, qperret@google.com, keirf@google.com, roypat@amazon.co.uk, shuah@kernel.org, hch@infradead.org, jgg@nvidia.com, rientjes@google.com, jhubbard@nvidia.com, fvdl@google.com, hughd@google.com, jthoughton@google.com Content-Type: text/plain; charset="UTF-8" X-Rspamd-Queue-Id: A9CD540011 X-Stat-Signature: yts7a1b6bbcyytnt76dpi5xod794hw45 X-Rspam-User: X-Rspamd-Server: rspam12 X-HE-Tag: 1738247068-817068 X-HE-Meta: U2FsdGVkX19vGWC0d+VWcStb19EOxvOanZrVEEZ0vHjpp4Ugmas4hrIKfns9ndwqfkqxan/BXFSPvUSuvNVe/wLeH1FWIsmWQ+obM3jV68Xp0sHsvpGqybMM9SMJKRkGUpcAUqW4kE9oQmljga+fL0Wg/rIExcGNBbwkQED9R6VF/mu1RgBR3bEwLQn/1lJlgsNGL57wjgDpDE9iTk3NRorypaWbz2X9qT0uRrK5CILOcNQJF9zwHKbZR7OcEpiQqt+Y1RqVOLDiAVOwfZQAUF3cyTH3S78/VdePQnL859I4E7FQHyfTqVQNIvF6aVr9IRsfNaMnCyxOWDtQw5Qfb5RztzZwJtR78hsZwnxSGSxg586o4CMl3ZpdR9cRIsQSCExZrdfIN4RDduxW9S+t6lYLK1FBqm+CqXS5w8vkCVqD1wkT9+hYDsX7K84hD/b2XUQy9NoPnXI9+V3PIDO9eKzOafdIQ9N4hRWut8DzjR45+f8ONgLf3weXd8fNn2i+1tyHVzbEp7ZNnaScwIDdvPIYNzKeWT09gWjJUeJMeJGHgKstwr2D1r1xnh6OeQvRif+HlRxuoTENHlKP8r/pAItKezKou2sNktaOluACcLf3azQc5SadRwjXYGyKx+n3M0+N2nS7LIz/94IXN89gxfBACNAK71tWWur0n0WFSrF4BNhsv3Qa9ocjDbhcvAL1Vwu/UtYqToaYxouJvgGR43eFR5wHlQUTIThSnAGmfMuSZw2mAAnWxL+wQBROq30uURa92syDm0Fki6T9zM2o3bMfz7ByiNURqvg+nVodmCW4km76mxMvbpha5fvq6qNeGNzsGtKgBhNWxOsn+PAaTEz2OubZGbgVn+KH7prCSkhYEq67xgJKXGg5xGX4GV9B4C/d3jmNpnJlHFMbe2l1dU8H5zefVTS28R2L1EjAvDjdYVE7ki3koy418HGgzj1WQZd2liBL0ET0YFkkI9H 84Dnt3Yk FeGGoNamc+mb0pGdhNE3JeNxlWD4pt8l22vZKAFSfSuapzOg9H1yIZB+HwLhn6dOOp6fGrE+bh7Cs+0CgfcIoGoN3USp1BBinQVo2B3hDR3kn/H+HH3mQEMATlo9J41YV72TO5nB2kahiWvdoVCxs44hnNUQrFiUXVJQeeREkzbSOZyr+k/Eosb1+s5rBruv/t4xSEWmTuB+ZDi1Ab1B1JAgQda7K7XhJ3SCs5FRD4alYa48C7fppWHCwYn57PP+VHCYcgMcZ2Jl0XqPaV1ivpcxy20JLa3ZVnyP8QIe01sVBFf6Zulqxmj3Qb91vCF+LmRTKypgSh4bkFN0C44xYaQ+Ggg== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Hi Ackerley, On Wed, 22 Jan 2025 at 22:24, Ackerley Tng wrote: > > Fuad Tabba writes: > > >> > > >> > > >> > +/* > >> > + * Registers a callback to __folio_put(), so that gmem knows that the host does > >> > + * not have any references to the folio. It does that by setting the folio type > >> > + * to guestmem. > >> > + * > >> > + * Returns 0 if the host doesn't have any references, or -EAGAIN if the host > >> > + * has references, and the callback has been registered. > >> > >> Note this comment. > >> > >> > + * > >> > + * Must be called with the following locks held: > >> > + * - filemap (inode->i_mapping) invalidate_lock > >> > + * - folio lock > >> > + */ > >> > +static int __gmem_register_callback(struct folio *folio, struct inode *inode, pgoff_t idx) > >> > +{ > >> > + struct xarray *mappable_offsets = &kvm_gmem_private(inode)->mappable_offsets; > >> > + void *xval_guest = xa_mk_value(KVM_GMEM_GUEST_MAPPABLE); > >> > + int refcount; > >> > + > >> > + rwsem_assert_held_write_nolockdep(&inode->i_mapping->invalidate_lock); > >> > + WARN_ON_ONCE(!folio_test_locked(folio)); > >> > + > >> > + if (folio_mapped(folio) || folio_test_guestmem(folio)) > >> > + return -EAGAIN; > >> > >> But here we return -EAGAIN and no callback was registered? > > > > This is intentional. If the folio is still mapped (i.e., its mapcount > > is elevated), then we cannot register the callback yet, so the > > host/vmm needs to unmap first, then try again. That said, I see the > > problem with the comment above, and I will clarify this. > > > >> > + > >> > + /* Register a callback first. */ > >> > + __folio_set_guestmem(folio); > >> > + > >> > + /* > >> > + * Check for references after setting the type to guestmem, to guard > >> > + * against potential races with the refcount being decremented later. > >> > + * > >> > + * At least one reference is expected because the folio is locked. > >> > + */ > >> > + > >> > + refcount = folio_ref_sub_return(folio, folio_nr_pages(folio)); > >> > + if (refcount == 1) { > >> > + int r; > >> > + > >> > + /* refcount isn't elevated, it's now faultable by the guest. */ > >> > >> Again this seems racy, somebody could have just speculatively increased it. > >> Maybe we need to freeze here as well? > > > > A speculative increase here is ok I think (famous last words). The > > callback was registered before the check, therefore, such an increase > > would trigger the callback. > > > > Thanks, > > /fuad > > > > > > I checked the callback (kvm_gmem_handle_folio_put()) and agree with you > that the mappability reset to KVM_GMEM_GUEST_MAPPABLE is handled > correctly (since kvm_gmem_handle_folio_put() doesn't assume anything > about the mappability state at callback-time). > > However, what if the new speculative reference writes to the page and > guest goes on to fault/use the page? In my last email I explained why I thought the code was fine as it is. Now that I'm updating the patch series with all the comments, I realized that even if I were right (which I am starting to doubt), freezing the refcount makes the code easier to reason about. So I'm going with ref_freeze here as well when I respin. Thanks again, /fuad > >> > + r = WARN_ON_ONCE(xa_err(xa_store(mappable_offsets, idx, xval_guest, GFP_KERNEL))); > >> > + if (!r) > >> > + __kvm_gmem_restore_pending_folio(folio); > >> > + > >> > + return r; > >> > + } > >> > + > >> > + return -EAGAIN; > >> > +} > >> > + > >> > +int kvm_slot_gmem_register_callback(struct kvm_memory_slot *slot, gfn_t gfn) > >> > +{ > >> > + unsigned long pgoff = slot->gmem.pgoff + gfn - slot->base_gfn; > >> > + struct inode *inode = file_inode(slot->gmem.file); > >> > + struct folio *folio; > >> > + int r; > >> > + > >> > + filemap_invalidate_lock(inode->i_mapping); > >> > + > >> > + folio = filemap_lock_folio(inode->i_mapping, pgoff); > >> > + if (WARN_ON_ONCE(IS_ERR(folio))) { > >> > + r = PTR_ERR(folio); > >> > + goto out; > >> > + } > >> > + > >> > + r = __gmem_register_callback(folio, inode, pgoff); > >> > + > >> > + folio_unlock(folio); > >> > + folio_put(folio); > >> > +out: > >> > + filemap_invalidate_unlock(inode->i_mapping); > >> > + > >> > + return r; > >> > +} > >> > + > >> > +/* > >> > + * Callback function for __folio_put(), i.e., called when all references by the > >> > + * host to the folio have been dropped. This allows gmem to transition the state > >> > + * of the folio to mappable by the guest, and allows the hypervisor to continue > >> > + * transitioning its state to private, since the host cannot attempt to access > >> > + * it anymore. > >> > + */ > >> > +void kvm_gmem_handle_folio_put(struct folio *folio) > >> > +{ > >> > + struct xarray *mappable_offsets; > >> > + struct inode *inode; > >> > + pgoff_t index; > >> > + void *xval; > >> > + > >> > + inode = folio->mapping->host; > >> > + index = folio->index; > >> > + mappable_offsets = &kvm_gmem_private(inode)->mappable_offsets; > >> > + xval = xa_mk_value(KVM_GMEM_GUEST_MAPPABLE); > >> > + > >> > + filemap_invalidate_lock(inode->i_mapping); > >> > + __kvm_gmem_restore_pending_folio(folio); > >> > + WARN_ON_ONCE(xa_err(xa_store(mappable_offsets, index, xval, GFP_KERNEL))); > >> > + filemap_invalidate_unlock(inode->i_mapping); > >> > +} > >> > + > >> > static bool gmem_is_mappable(struct inode *inode, pgoff_t pgoff) > >> > { > >> > struct xarray *mappable_offsets = &kvm_gmem_private(inode)->mappable_offsets; > >>