From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id B760BC02181 for ; Wed, 22 Jan 2025 22:24:57 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 14C4F6B0082; Wed, 22 Jan 2025 17:24:57 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 0FCCF6B0083; Wed, 22 Jan 2025 17:24:57 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id EDEC36B0085; Wed, 22 Jan 2025 17:24:56 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id CF8CB6B0082 for ; Wed, 22 Jan 2025 17:24:56 -0500 (EST) Received: from smtpin17.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 3A5A3C0870 for ; Wed, 22 Jan 2025 22:24:56 +0000 (UTC) X-FDA: 83036518992.17.E37E0B1 Received: from mail-pj1-f74.google.com (mail-pj1-f74.google.com [209.85.216.74]) by imf02.hostedemail.com (Postfix) with ESMTP id 7BF8180012 for ; Wed, 22 Jan 2025 22:24:54 +0000 (UTC) Authentication-Results: imf02.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=bmeSFSe1; spf=pass (imf02.hostedemail.com: domain of 3NHCRZwsKCJ079HBOIBVQKDDLLDIB.9LJIFKRU-JJHS79H.LOD@flex--ackerleytng.bounces.google.com designates 209.85.216.74 as permitted sender) smtp.mailfrom=3NHCRZwsKCJ079HBOIBVQKDDLLDIB.9LJIFKRU-JJHS79H.LOD@flex--ackerleytng.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1737584694; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:dkim-signature; bh=CMwXY/JjkJ3lg0HRTO0H7XKfXOhFwAMvwIMtHu9bFGE=; b=YWTa4YSXGVCm8jfJVuK7N7jEsMyvtgZ/JO0FHMf5hJY8hR5GYWa5LWhsoLOzbCDvLNYHpB SXIvjYS1AByBGx/YihvWLcg7SziNw6d1LRWm9rQ65BcZSQmDrSBEJ+wfFvpGbvSAGFbh1P 90C34rnAfirMnPYpQEio05sRwo3Qz3o= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1737584694; a=rsa-sha256; cv=none; b=iC+bsd1e06r/er5OFIh9cWp+iC1O0FY9xRr6Um9ViKrY7FxfWKpmSctiVdFFLKU2cuSLXb oLaYAaYrnf6xYwffhXGj9NBwcUH/90NEMSA4W9r6Rdx7quKMbu+Hfdh9GOidyqOLBmyz+r Pwv5ffAiANo8U5koJpkps+p7wFCDrGE= ARC-Authentication-Results: i=1; imf02.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=bmeSFSe1; spf=pass (imf02.hostedemail.com: domain of 3NHCRZwsKCJ079HBOIBVQKDDLLDIB.9LJIFKRU-JJHS79H.LOD@flex--ackerleytng.bounces.google.com designates 209.85.216.74 as permitted sender) smtp.mailfrom=3NHCRZwsKCJ079HBOIBVQKDDLLDIB.9LJIFKRU-JJHS79H.LOD@flex--ackerleytng.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com Received: by mail-pj1-f74.google.com with SMTP id 98e67ed59e1d1-2efc3292021so849434a91.1 for ; Wed, 22 Jan 2025 14:24:54 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1737584693; x=1738189493; darn=kvack.org; h=cc:to:from:subject:message-id:mime-version:in-reply-to:date:from:to :cc:subject:date:message-id:reply-to; bh=CMwXY/JjkJ3lg0HRTO0H7XKfXOhFwAMvwIMtHu9bFGE=; b=bmeSFSe1gBy79Q6rlMoih/aevDGHIsyGRP+NX7Mv53KMPfTpbhEpJMQRWhPiE6C4wx FCVgZG2omvmdkRoNwfIbjryH1PfZPYwHxe/vqYl2HgdCOhVjsXFJvNgVyn4g2JvHluDS qVpLGoU+JZsbq1vgmmkZMZ1zCK2AyIXkW9rTYtLTpEECaUo37mzAFcg+5cSF3b1wyY7G D3pYUaKXThPz0WMWSqJlM1OGH7G6Hj6F6t/qYpQf2vfpXdFU2cGvepVNvj5QaGQI9pum byIZoCBMsJU4j0duN2V93g4dXrdqqXu8HeYv0aCLfbhJ0V9KW474jBEQnCXYxmrgIlZx jkpw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1737584693; x=1738189493; h=cc:to:from:subject:message-id:mime-version:in-reply-to:date :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=CMwXY/JjkJ3lg0HRTO0H7XKfXOhFwAMvwIMtHu9bFGE=; b=nOF+ES6y3e5M6DFr29A+kz6tg4vyhghd96tuBBjV+vp8KMpn/55El4/3VUP3KjD2ug hWGmoLCkEKMvzjDgoLVNGVc45A+09515OuPAdHAdiNxZ9COgKsINI9KQ0mJ5ygN+ZWa+ Jm8ace6JKaaejm3hT4W33R+OfstKLjBOpKkA03/+db+21i/yOhOrHrVp3bKyQcqGkAfY P0LcUZg1ESYHIefmWsDx1PzO2SEOp2HAn//mtkIevkZ2X1CaMJK8WQ9XR1LOotdB0JFo /nyQ/dNa2DA+RN3r2iey923sbz4FhhjMaiP9AIO1cp2oDPxpGvImHL77CBIF2wog1OJY aAyA== X-Forwarded-Encrypted: i=1; AJvYcCWFStoefKIY0j1bUgMrXj/jqOON6b3/QoDqUfmY4Vmc0h2/gUbWFWMnCub/oxquYAz1xg/ChimAaA==@kvack.org X-Gm-Message-State: AOJu0YxW9Vojm6Knlt6VKSUDnYE+6kWNq5ZrUbpm5RC1RxrR8Qz5TA6i 5W5OilxRbs2cvdn4vfjq8CoQHDh2dnWNPuqIaMxDMdf69OAPx41cabmDCJV78tQs+HOCwkZyHiH +VnhGUQZQ44oc/b6LjkK01A== X-Google-Smtp-Source: AGHT+IGny/ahIaHLhfGDMFBH8s+87+rJDqnReE6Qarm33JtTuAS7FXf81z1qgVp9GHi0Sc6XCHS/z/HSfVxK0lzYXg== X-Received: from pfjx27.prod.google.com ([2002:aa7:9a5b:0:b0:72a:bcc3:4c9a]) (user=ackerleytng job=prod-delivery.src-stubby-dispatcher) by 2002:a05:6a00:4fd3:b0:72a:8461:d172 with SMTP id d2e1a72fcca58-72daf9becfamr37134850b3a.3.1737584692988; Wed, 22 Jan 2025 14:24:52 -0800 (PST) Date: Wed, 22 Jan 2025 22:24:51 +0000 In-Reply-To: (message from Fuad Tabba on Mon, 20 Jan 2025 12:14:50 +0000) Mime-Version: 1.0 Message-ID: Subject: Re: [RFC PATCH v5 06/15] KVM: guest_memfd: Handle final folio_put() of guestmem pages From: Ackerley Tng To: Fuad Tabba Cc: vbabka@suse.cz, kvm@vger.kernel.org, linux-arm-msm@vger.kernel.org, linux-mm@kvack.org, pbonzini@redhat.com, chenhuacai@kernel.org, mpe@ellerman.id.au, anup@brainfault.org, paul.walmsley@sifive.com, palmer@dabbelt.com, aou@eecs.berkeley.edu, seanjc@google.com, viro@zeniv.linux.org.uk, brauner@kernel.org, willy@infradead.org, akpm@linux-foundation.org, xiaoyao.li@intel.com, yilun.xu@intel.com, chao.p.peng@linux.intel.com, jarkko@kernel.org, amoorthy@google.com, dmatlack@google.com, yu.c.zhang@linux.intel.com, isaku.yamahata@intel.com, mic@digikod.net, vannapurve@google.com, mail@maciej.szmigiero.name, david@redhat.com, michael.roth@amd.com, wei.w.wang@intel.com, liam.merwick@oracle.com, isaku.yamahata@gmail.com, kirill.shutemov@linux.intel.com, suzuki.poulose@arm.com, steven.price@arm.com, quic_eberman@quicinc.com, quic_mnalajal@quicinc.com, quic_tsoni@quicinc.com, quic_svaddagi@quicinc.com, quic_cvanscha@quicinc.com, quic_pderrin@quicinc.com, quic_pheragu@quicinc.com, catalin.marinas@arm.com, james.morse@arm.com, yuzenghui@huawei.com, oliver.upton@linux.dev, maz@kernel.org, will@kernel.org, qperret@google.com, keirf@google.com, roypat@amazon.co.uk, shuah@kernel.org, hch@infradead.org, jgg@nvidia.com, rientjes@google.com, jhubbard@nvidia.com, fvdl@google.com, hughd@google.com, jthoughton@google.com Content-Type: text/plain; charset="UTF-8" X-Stat-Signature: 9qfoajkh8dtxwtzarjdeu8816ic1jca6 X-Rspam-User: X-Rspamd-Queue-Id: 7BF8180012 X-Rspamd-Server: rspam03 X-HE-Tag: 1737584694-766434 X-HE-Meta: U2FsdGVkX18HUCsfF8rJjYxrBYt8pK/v64Mge8uAonb6XtcTd7nKaI4woqejd6k2lmL96aHj1XTTkh1k0rsxhPUheTNbu0/2FGPXIjGoNTPtWfRy2dBXUVgXwZi1UlDQ3Mj5yikH661XJOrRjsrq8sWt9oGve3hXHk1hfr6I5jM9DOYq1qzErHbsQjOKgEh2TPX7Bjzd9JfvmhdYoqxLEzOPG3lhMB1KsxTYY2b+KOpKkNpbOrUn5EvwXvVHT2MKToFLUsXUmpnNFzDJncjrLSjdLXn7lqiP3MBiAQtPdAbpv1S3ervstvSjiGfvY2+ju1teKW3/dZOyZLjqjg81PM/hzqMgOUA4bND4jdP1he/yW3qfy0YyekXc/yYKATd615KlhP4L8OEC+91InHjL+q0rslmXSQk/kZTopTsThZxdJGbn7U+aC1DGEyTo5dIdyu6rpYjU5qkL0+eo4IZLFbC+keFXrzcb4lM6DYpBHFxuhrSgABm6YNh9IWdCm7wzmULx6Q1dvWkJQANtTitGe7SOEKAZbHpzlvzvIYWFol7sh+mdy6Ju2u4FOx82sGY19Dl9jQN8si0yo7oL6uQpillxHq0vO88Lh3PrPc1dKErbz2kqeMSlK0SmVfGtQLYXQWJfmB0FIfd4z8dpK4hIanxw1GEaKDZM6G3eZqaZCw/56e5kkzboN0jaziDWlpJ2/9Uz5LeTz6q2Xy0qxXsMk17iLekhV7/YuDpGd7uyZdaStb4lnYBvG/+4ASD4m71SD0DPemP3AahJZVFNK4YRHiqPRGSvu+snzZbdOgT0U/nCMfN3Zqq3BrOiJX8mxvHVQQz5MW5JDdHEnJbtizeSOA7Hm2Xkw0yP5Nh6baTcITV1PBXrfpGG7RtAbtcCBvLhMa5rqcMakdpPhIG1G3CAKGhIBbvw+NVdgphb/ZYXeWprLVTRhMtNKe5zPhwpoNHhF3E7iIyxnfnAXE4yUXM nQKSrbl3 wozHx+6y2lWMTW/lZvSPdoeXChDFh3rPzJA3WWAbiPBJpUq8L6VbltQyRiIkKMZwcbfsPJW9iiICldrndFWUDjWghScGSKO+NmcSMbgDa9O4hrjLb650zUJLj6tadQs+3mD9QhxeJJ/JUcP9EaktOUrnY+u3Ee0V6TI8Yw0kLkuKZZvgG6TfSukYvq3NEN9wQI9PXvuSHh3s+ET87sHFg5rm3BRsPV58AqzPsLYw36cuIS7f66qHqlKmhBXBDkF8LlcoUbImfRiH8KvkgE213qLIHc5Ls/nfXTyjXUFDfx5oZAcJV9eqpuU+96k8rJTef9+PPJJrMcJp2qTfEcQUOPA5SSCMFS+n772CinhHmCEp7s4GVVhyeE/k3O+RQ190Uq05kyltGh6dk9H7LNUV4qzTV3pErVCUpDfbyu5x1+F91Nsf/wOvnUJo/L5UKZ0yng4wYwQwmHGAp94+X/rJTNHEi3Zh/cpej5pZN1PTL7b4+oiSlWKqrlqr1aFIHoLMT+uwtlJHdmyX9FIPkP5BQZE1vzDmo8Wgc7OXzgwxbGCgLkDnbNq88Z88JbyeKsiTKPJuvIr4aioJh9MuVE+7zehyXKJ/GchV+x49WcZTt/Yx4esEvu9a/U8xtZWz3ubsvQRh47cmhmkKBRoobS1DsF1GCAcAXD8kMHWAk0UkAKRPdljzIwAoRS9XBpw== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Fuad Tabba writes: >> > >> > >> > +/* >> > + * Registers a callback to __folio_put(), so that gmem knows that the host does >> > + * not have any references to the folio. It does that by setting the folio type >> > + * to guestmem. >> > + * >> > + * Returns 0 if the host doesn't have any references, or -EAGAIN if the host >> > + * has references, and the callback has been registered. >> >> Note this comment. >> >> > + * >> > + * Must be called with the following locks held: >> > + * - filemap (inode->i_mapping) invalidate_lock >> > + * - folio lock >> > + */ >> > +static int __gmem_register_callback(struct folio *folio, struct inode *inode, pgoff_t idx) >> > +{ >> > + struct xarray *mappable_offsets = &kvm_gmem_private(inode)->mappable_offsets; >> > + void *xval_guest = xa_mk_value(KVM_GMEM_GUEST_MAPPABLE); >> > + int refcount; >> > + >> > + rwsem_assert_held_write_nolockdep(&inode->i_mapping->invalidate_lock); >> > + WARN_ON_ONCE(!folio_test_locked(folio)); >> > + >> > + if (folio_mapped(folio) || folio_test_guestmem(folio)) >> > + return -EAGAIN; >> >> But here we return -EAGAIN and no callback was registered? > > This is intentional. If the folio is still mapped (i.e., its mapcount > is elevated), then we cannot register the callback yet, so the > host/vmm needs to unmap first, then try again. That said, I see the > problem with the comment above, and I will clarify this. > >> > + >> > + /* Register a callback first. */ >> > + __folio_set_guestmem(folio); >> > + >> > + /* >> > + * Check for references after setting the type to guestmem, to guard >> > + * against potential races with the refcount being decremented later. >> > + * >> > + * At least one reference is expected because the folio is locked. >> > + */ >> > + >> > + refcount = folio_ref_sub_return(folio, folio_nr_pages(folio)); >> > + if (refcount == 1) { >> > + int r; >> > + >> > + /* refcount isn't elevated, it's now faultable by the guest. */ >> >> Again this seems racy, somebody could have just speculatively increased it. >> Maybe we need to freeze here as well? > > A speculative increase here is ok I think (famous last words). The > callback was registered before the check, therefore, such an increase > would trigger the callback. > > Thanks, > /fuad > > I checked the callback (kvm_gmem_handle_folio_put()) and agree with you that the mappability reset to KVM_GMEM_GUEST_MAPPABLE is handled correctly (since kvm_gmem_handle_folio_put() doesn't assume anything about the mappability state at callback-time). However, what if the new speculative reference writes to the page and guest goes on to fault/use the page? >> > + r = WARN_ON_ONCE(xa_err(xa_store(mappable_offsets, idx, xval_guest, GFP_KERNEL))); >> > + if (!r) >> > + __kvm_gmem_restore_pending_folio(folio); >> > + >> > + return r; >> > + } >> > + >> > + return -EAGAIN; >> > +} >> > + >> > +int kvm_slot_gmem_register_callback(struct kvm_memory_slot *slot, gfn_t gfn) >> > +{ >> > + unsigned long pgoff = slot->gmem.pgoff + gfn - slot->base_gfn; >> > + struct inode *inode = file_inode(slot->gmem.file); >> > + struct folio *folio; >> > + int r; >> > + >> > + filemap_invalidate_lock(inode->i_mapping); >> > + >> > + folio = filemap_lock_folio(inode->i_mapping, pgoff); >> > + if (WARN_ON_ONCE(IS_ERR(folio))) { >> > + r = PTR_ERR(folio); >> > + goto out; >> > + } >> > + >> > + r = __gmem_register_callback(folio, inode, pgoff); >> > + >> > + folio_unlock(folio); >> > + folio_put(folio); >> > +out: >> > + filemap_invalidate_unlock(inode->i_mapping); >> > + >> > + return r; >> > +} >> > + >> > +/* >> > + * Callback function for __folio_put(), i.e., called when all references by the >> > + * host to the folio have been dropped. This allows gmem to transition the state >> > + * of the folio to mappable by the guest, and allows the hypervisor to continue >> > + * transitioning its state to private, since the host cannot attempt to access >> > + * it anymore. >> > + */ >> > +void kvm_gmem_handle_folio_put(struct folio *folio) >> > +{ >> > + struct xarray *mappable_offsets; >> > + struct inode *inode; >> > + pgoff_t index; >> > + void *xval; >> > + >> > + inode = folio->mapping->host; >> > + index = folio->index; >> > + mappable_offsets = &kvm_gmem_private(inode)->mappable_offsets; >> > + xval = xa_mk_value(KVM_GMEM_GUEST_MAPPABLE); >> > + >> > + filemap_invalidate_lock(inode->i_mapping); >> > + __kvm_gmem_restore_pending_folio(folio); >> > + WARN_ON_ONCE(xa_err(xa_store(mappable_offsets, index, xval, GFP_KERNEL))); >> > + filemap_invalidate_unlock(inode->i_mapping); >> > +} >> > + >> > static bool gmem_is_mappable(struct inode *inode, pgoff_t pgoff) >> > { >> > struct xarray *mappable_offsets = &kvm_gmem_private(inode)->mappable_offsets; >>