From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9463CC6FA8F for ; Wed, 30 Aug 2023 15:16:27 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id D41A2280059; Wed, 30 Aug 2023 11:16:26 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id CF219280058; Wed, 30 Aug 2023 11:16:26 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id BE0DE280059; Wed, 30 Aug 2023 11:16:26 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id AFE09280058 for ; Wed, 30 Aug 2023 11:16:26 -0400 (EDT) Received: from smtpin30.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 4D33E40137 for ; Wed, 30 Aug 2023 15:16:26 +0000 (UTC) X-FDA: 81181122372.30.C983EEF Received: from mgamail.intel.com (mgamail.intel.com [134.134.136.31]) by imf06.hostedemail.com (Postfix) with ESMTP id 97EA6180017 for ; Wed, 30 Aug 2023 15:16:22 +0000 (UTC) Authentication-Results: imf06.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b="UD8Pz/HW"; dmarc=pass (policy=none) header.from=intel.com; spf=none (imf06.hostedemail.com: domain of binbin.wu@linux.intel.com has no SPF policy when checking 134.134.136.31) smtp.mailfrom=binbin.wu@linux.intel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1693408583; a=rsa-sha256; cv=none; b=3jctl80BS9ORyGxLbrr5TK9u/VOvjtnQ4bLFbXFkUZwr/DyIkdv5AbKWuxyM+2k6s4Iu7S QYvSaj05STXK2FySLvldg6mCtF55ul/veMIB2N4fH31bXmaV1iouJksgHuQPP7BAD+xZc2 meYmeFL2Zy4NMamcO0jDKaRUhbALjRA= ARC-Authentication-Results: i=1; imf06.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b="UD8Pz/HW"; dmarc=pass (policy=none) header.from=intel.com; spf=none (imf06.hostedemail.com: domain of binbin.wu@linux.intel.com has no SPF policy when checking 134.134.136.31) smtp.mailfrom=binbin.wu@linux.intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1693408583; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=rmb8mK+6haB2aMJXsIg62NlqRLpJmUuXJn2daO7jZ7o=; b=e7/SqEsNS/SUZ/nATCLFeglF9b0YxTnvDem0bOeCHtWD846jWmEKXYP6ywEoTQO4S5kVOe oji9UFPJTR6YF36R4MasaIB24VoT09LKtPgTbheDKfX4IaBaTEz0B1g7Xd59lbVLJIhkLh 56p1aPsPBcGo4fLZcYxGSFNB2BkOkIk= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1693408582; x=1724944582; h=message-id:date:mime-version:subject:to:cc:references: from:in-reply-to:content-transfer-encoding; bh=kuhHM9Pb2S9UA4HVLvALcSSOlQFsYNKCJA5urnK1yJE=; b=UD8Pz/HW6JEgt3jfst5cGqUwsolyeM3zHBuRKLVpMHbbrSNMcE+aBj3W 3Fa+Zzh/aIBXDqbryWR92GLddKLUc9ssOiqe4tOpiyBS1bPto3qSqCdPw bwPB0G0ymDT5MYS+qPQD7fzcKLUObuMb8innymM/cAcT3Qab8x4/+eFiA JfF+T2bjINQkH9xmWwWmJiDUW/novaUDcovM7mC5f5vVh/XrKHH0zVmQ2 7l+EVAXcIhQzX368CPDakO74+T5pTCMMSaiTEr0r7dcsbM7fXtwzgr1hK Vgvts1Ns/bVeOesVs76vX7V2YIBHuFXUL4tRRN1sD2RleY8DFSDjZJ3wI g==; X-IronPort-AV: E=McAfee;i="6600,9927,10818"; a="439614387" X-IronPort-AV: E=Sophos;i="6.02,214,1688454000"; d="scan'208";a="439614387" Received: from fmsmga008.fm.intel.com ([10.253.24.58]) by orsmga104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Aug 2023 08:12:31 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10818"; a="804574149" X-IronPort-AV: E=Sophos;i="6.02,214,1688454000"; d="scan'208";a="804574149" Received: from binbinwu-mobl.ccr.corp.intel.com (HELO [10.93.25.116]) ([10.93.25.116]) by fmsmga008-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Aug 2023 08:12:21 -0700 Message-ID: <30ffe039-c9e2-b996-500d-5e11bf6ea789@linux.intel.com> Date: Wed, 30 Aug 2023 23:12:19 +0800 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101 Thunderbird/102.14.0 Subject: Re: [RFC PATCH v11 12/29] KVM: Add KVM_CREATE_GUEST_MEMFD ioctl() for guest-specific backing memory To: Sean Christopherson Cc: kvm@vger.kernel.org, linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev, linux-mips@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, kvm-riscv@lists.infradead.org, linux-riscv@lists.infradead.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-security-module@vger.kernel.org, linux-kernel@vger.kernel.org, Paolo Bonzini , Marc Zyngier , Oliver Upton , Huacai Chen , Michael Ellerman , Anup Patel , Paul Walmsley , Palmer Dabbelt , Albert Ou , "Matthew Wilcox (Oracle)" , Andrew Morton , Paul Moore , James Morris , "Serge E. Hallyn" , Chao Peng , Fuad Tabba , Jarkko Sakkinen , Yu Zhang , Vishal Annapurve , Ackerley Tng , Maciej Szmigiero , Vlastimil Babka , David Hildenbrand , Quentin Perret , Michael Roth , Wang , Liam Merwick , Isaku Yamahata , "Kirill A . Shutemov" References: <20230718234512.1690985-1-seanjc@google.com> <20230718234512.1690985-13-seanjc@google.com> From: Binbin Wu In-Reply-To: <20230718234512.1690985-13-seanjc@google.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Rspam-User: X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: 97EA6180017 X-Stat-Signature: 98rhs4r9gzgfwjmaip49d54q1qgixeya X-HE-Tag: 1693408582-249020 X-HE-Meta: U2FsdGVkX1+eU2hnEr0ZAcAWRkAR0JyUTTV6ZvZk6zQIP8hENNFVcGgSxbbBArkkuGKVDVQVNCNfY4Gyit3YMVBU5WbQKRlmuPa3IrIMXKa7MwQw54lqrLU13cg1sgD0xLLWsARx+2XVte8yHrpVxVad+64FaEgQhwwO8/jGLbm04Hhw5AnEsp5ANYm850rQsdr9hs9ubJPve9aYHljWvHs0vtrus962tVXqJUeyDS2woi8JP+RrlJvxYUJFajMNaOGIcW276xSaVdDS0wM5omCU+mfGiKS9iIKrDCkqIbJ3ZxjascSTQoxZwAKl5QrN9JI9o6kHVUaAURKp6s5ygPQfAbRLs/FTVP7Dz4KjmqCrdL5iKcbMmbuUma1RGZoVD+wzcFRs6TjTJjp8NYJAoHOWu12YvOvqLdXCBM2RjpREjlL44YQCoJalOwcfGIv90RFOZTwe3RCrXkqV7bCMSsmHHP3PvoL1tkC1IgjxxJW5oENTBo+7UGPJpROv+hfsenGbclIMO9vPznXaDWgRKC2rS43VECp+YaYBB2rDzvtnAfGFIxzaGSSLVXaPFZuYvQ1KpUU5BrYqwgsjsFF7FnpbfNMzo3ECn3C4/JY0kfoH49aTGnCmjDgVeIZSljXZlrCbo+khWgsy7L/AngTsHabl1y/uCAJnyEMHr1DtGF/TR/leVAXM/QCN7p4Y/5rdyJMjF9FArj/ROYvChJORceYChw8/8Fl+IhBqt4IuUvHSrwUBK2RjYDKlsreH/BHhp34U0e7FnLbDLwXjUPdRJnD/nRpxiguBK8gU2KmqRIt+HhZa5gX7Up2Ffedb4RXHi4bPkkrHIdI1alWZKUxG/y+cuLKI4SOg/5r0UMdGmDOwF+QEo+fq8ikc4Ia4JoYu8s3wyGsdq+57HTGpgpb6LJ86+kBJ971wG50SVd3x+u//FvtWTTkc1V+aEmzTYmfb8uLXfbVarfRY783NbBT HbVJ9oId 32ES5TKc1/DTh/JtX1XItHUOuUuAw72NUTa5+x2sPi9YgiP6+tXtM67JlgAJGeCTa0dXYLd+JFGVTw7uwvauj/u6aLFpnEtCi3yB+9Sb47+H1tAYBZzWs5/VXvpcjLb7NVF7/uVT0LXyA5Kz1U5rINLm/9+sBhccSRa6KS0HZ3ZTZY29glz6bRITLHWzAS67yHu9JFr0pAKkdfwiWChovq5cGJ8llUgEgTtUtLSTmo8htmV0JjKWMJMeX0Q== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 7/19/2023 7:44 AM, Sean Christopherson wrote: [...] > + > +static struct folio *kvm_gmem_get_folio(struct file *file, pgoff_t index) > +{ > + struct folio *folio; > + > + /* TODO: Support huge pages. */ > + folio = filemap_grab_folio(file->f_mapping, index); > + if (!folio) Should use  if ((IS_ERR(folio)) instead. > + return NULL; > + > + /* > + * Use the up-to-date flag to track whether or not the memory has been > + * zeroed before being handed off to the guest. There is no backing > + * storage for the memory, so the folio will remain up-to-date until > + * it's removed. > + * > + * TODO: Skip clearing pages when trusted firmware will do it when > + * assigning memory to the guest. > + */ > + if (!folio_test_uptodate(folio)) { > + unsigned long nr_pages = folio_nr_pages(folio); > + unsigned long i; > + > + for (i = 0; i < nr_pages; i++) > + clear_highpage(folio_page(folio, i)); > + > + folio_mark_uptodate(folio); > + } > + > + /* > + * Ignore accessed, referenced, and dirty flags. The memory is > + * unevictable and there is no storage to write back to. > + */ > + return folio; > +} [...] > + > +static long kvm_gmem_allocate(struct inode *inode, loff_t offset, loff_t len) > +{ > + struct address_space *mapping = inode->i_mapping; > + pgoff_t start, index, end; > + int r; > + > + /* Dedicated guest is immutable by default. */ > + if (offset + len > i_size_read(inode)) > + return -EINVAL; > + > + filemap_invalidate_lock_shared(mapping); > + > + start = offset >> PAGE_SHIFT; > + end = (offset + len) >> PAGE_SHIFT; > + > + r = 0; > + for (index = start; index < end; ) { > + struct folio *folio; > + > + if (signal_pending(current)) { > + r = -EINTR; > + break; > + } > + > + folio = kvm_gmem_get_folio(inode, index); > + if (!folio) { > + r = -ENOMEM; > + break; > + } > + > + index = folio_next_index(folio); > + > + folio_unlock(folio); > + folio_put(folio); May be a dumb question, why we get the folio and then put it immediately? Will it make the folio be released back to the page allocator? > + > + /* 64-bit only, wrapping the index should be impossible. */ > + if (WARN_ON_ONCE(!index)) > + break; > + > + cond_resched(); > + } > + > + filemap_invalidate_unlock_shared(mapping); > + > + return r; > +} > + [...] > + > +int kvm_gmem_bind(struct kvm *kvm, struct kvm_memory_slot *slot, > + unsigned int fd, loff_t offset) > +{ > + loff_t size = slot->npages << PAGE_SHIFT; > + unsigned long start, end, flags; > + struct kvm_gmem *gmem; > + struct inode *inode; > + struct file *file; > + > + BUILD_BUG_ON(sizeof(gfn_t) != sizeof(slot->gmem.pgoff)); > + > + file = fget(fd); > + if (!file) > + return -EINVAL; > + > + if (file->f_op != &kvm_gmem_fops) > + goto err; > + > + gmem = file->private_data; > + if (gmem->kvm != kvm) > + goto err; > + > + inode = file_inode(file); > + flags = (unsigned long)inode->i_private; > + > + /* > + * For simplicity, require the offset into the file and the size of the > + * memslot to be aligned to the largest possible page size used to back > + * the file (same as the size of the file itself). > + */ > + if (!kvm_gmem_is_valid_size(offset, flags) || > + !kvm_gmem_is_valid_size(size, flags)) > + goto err; > + > + if (offset + size > i_size_read(inode)) > + goto err; > + > + filemap_invalidate_lock(inode->i_mapping); > + > + start = offset >> PAGE_SHIFT; > + end = start + slot->npages; > + > + if (!xa_empty(&gmem->bindings) && > + xa_find(&gmem->bindings, &start, end - 1, XA_PRESENT)) { > + filemap_invalidate_unlock(inode->i_mapping); > + goto err; > + } > + > + /* > + * No synchronize_rcu() needed, any in-flight readers are guaranteed to > + * be see either a NULL file or this new file, no need for them to go > + * away. > + */ > + rcu_assign_pointer(slot->gmem.file, file); > + slot->gmem.pgoff = start; > + > + xa_store_range(&gmem->bindings, start, end - 1, slot, GFP_KERNEL); > + filemap_invalidate_unlock(inode->i_mapping); > + > + /* > + * Drop the reference to the file, even on success. The file pins KVM, > + * not the other way 'round. Active bindings are invalidated if the an extra ',  or maybe around? > + * file is closed before memslots are destroyed. > + */ > + fput(file); > + return 0; > + > +err: > + fput(file); > + return -EINVAL; > +} > + [...] > []