From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id B9201E77188 for ; Sat, 4 Jan 2025 09:16:23 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 2A7D76B0089; Sat, 4 Jan 2025 04:16:23 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 255F86B008A; Sat, 4 Jan 2025 04:16:23 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 11E296B008C; Sat, 4 Jan 2025 04:16:23 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id E7E036B0089 for ; Sat, 4 Jan 2025 04:16:22 -0500 (EST) Received: from smtpin04.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 7A5024494B for ; Sat, 4 Jan 2025 09:16:22 +0000 (UTC) X-FDA: 82969213404.04.9521D67 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by imf25.hostedemail.com (Postfix) with ESMTP id B8969A0009 for ; Sat, 4 Jan 2025 09:16:20 +0000 (UTC) Authentication-Results: imf25.hostedemail.com; dkim=pass header.d=linuxfoundation.org header.s=korg header.b=s4uzfjBu; spf=pass (imf25.hostedemail.com: domain of gregkh@linuxfoundation.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=gregkh@linuxfoundation.org; dmarc=pass (policy=none) header.from=linuxfoundation.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1735982180; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=c9LEgvk/5qLRf0wGgfyPkmeiTCuxEA26/T+o11o/Mqg=; b=Qzzj2/Gao3A0ZIRPfALcqivrCxMKGFnzJaObAWQBv7Ha+hTzy2vliV3nBTX7+GzKKYUCWM 2bJLJvy8ifFH5WQHzQuh+z7QujDSeQWnY5nrPh31SaGju0wAjQvuvscxEFU6vs5L/2UAK6 r7fC2WUCz05UrY5RGyvPUggT0/fl6EY= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1735982180; a=rsa-sha256; cv=none; b=HZgoFcKopxUuvsqB9C8WSckLa04OKyweX46W0hsyCZh7Fvd6xaMGNR/UCsWbG+aljNLaV9 OBrbZIQ5N0AinEz8lX3J2nXjVdinO7AIBr2hYk/jDfvF4jmGsySGs00B4eOJuoroLBxqSb BytVAlPNCOQkPSNjZFrw77NvzbBCxms= ARC-Authentication-Results: i=1; imf25.hostedemail.com; dkim=pass header.d=linuxfoundation.org header.s=korg header.b=s4uzfjBu; spf=pass (imf25.hostedemail.com: domain of gregkh@linuxfoundation.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=gregkh@linuxfoundation.org; dmarc=pass (policy=none) header.from=linuxfoundation.org Received: from smtp.kernel.org (transwarp.subspace.kernel.org [100.75.92.58]) by dfw.source.kernel.org (Postfix) with ESMTP id A113C5C499F; Sat, 4 Jan 2025 09:15:38 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 31A06C4CED1; Sat, 4 Jan 2025 09:16:19 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linuxfoundation.org; s=korg; t=1735982179; bh=6VBIhXx0eivTlNHc2gJrHjg2QXuMuBMEWXQy/CT7ASE=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=s4uzfjBuHB/1X3vOeSx39WvaaW/y89VKoT8fvU+3fipJnwtmLPncurGunWvxm0HXs Y5AuKSc61ruvgtmw43UrYiHe1gUrF4SSBdXoCPHDafo9Y3BPkstyQuW6/CRhnlUztW mzcSIIWaguGYVwPXDcoAKlDJcYkGAuG7f58Ha7+w= Date: Sat, 4 Jan 2025 10:15:30 +0100 From: Greg Kroah-Hartman To: Yuanchu Xie Cc: Wei Liu , Rob Bradford , Pasha Tatashin , linux-kernel@vger.kernel.org, linux-mm@kvack.org, virtualization@lists.linux.dev, dev@lists.cloudhypervisor.org Subject: Re: [PATCH v5 1/2] virt: pvmemcontrol: control guest physical memory properties Message-ID: <2025010448-citizen-untrained-d607@gregkh> References: <20241203002328.694071-1-yuanchu@google.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20241203002328.694071-1-yuanchu@google.com> X-Rspamd-Server: rspam02 X-Rspamd-Queue-Id: B8969A0009 X-Stat-Signature: byjzhw1g8d9sn3o6ijwa5kgq6hmeit48 X-Rspam-User: X-HE-Tag: 1735982180-989820 X-HE-Meta: U2FsdGVkX1+mcg9s3lvLR0waEB/NLAsiWYMWfISiQ6bKlKDzRQzqqjkg+4xiEWc/xIkC9xFR+qrA2tR4B2oRYQ4FlstY2vnaROpfJQcSXeFKIsjAYbHzwsJOBejBgDSeMtoJ5oPDR3hckoyFQL0E32wAqagLZk1UP1qWo83pKnsIh6mR4wkL4u0L/wtZTmGoUc7xHNSk249vrI5PQXjix8en2cQ2uNdA63AGJxaiKgdB6xQcFKLhV90RLoU2e9nddFlE19qNwS3WX07YGs+wHG4nJR27snEqKXQVHj379J2xOK1eFpvPMR5mUK9NwJDXulE2JEvRYy61CsGkdQGtD/3axfn1LoopLAvgKSLFXuLOWEp2rR0kR5WKy5H8aBzbbgD5kRQVY/vY10tR9Jp/WehuPs80xcsY0byAS3iaaDK0snWjHL4EjoAZod2rG0oIYIj6WY246mVM6vY0r8VnZtGGnCAV2foF4fd6uI4khS6vv0V73XqzkqVqTLLvoE65g/WVHhimrTbZmJB/xlUGjceL7hP2EmtsUqtc7qFzQvtIeTrxLLYOe3+9U2thoTfevTnaH9VukxxHwXGdT5uI+++AjvfRcd48CdCfUhChDHxnI9AMIFljcwQMOUUgvehq50boK3d/3u/T8z5ZLM6cY6bsbls86dbJyCH0ZRhfuGEDkt+rE67fo9WtAfZLIzIWUAoq+yatbsySicjx/kaorlwuh/k4ZzvGOfpdzu1+xvC0IanAgnNxNOAsAbe/vYe8JBR1x7pKzN5ChtIp7F5qTysVQ7UrXiBx4k6BorRo2SUdPpjRzoVeh16Zk4INgR20qR/B1Kv3zwsJaHj6OKnruaEnWRV2X9DhSBzIVpuid3z12UzYSPZaFw8Ef+7HPCE0KsRzJPfhZsAs2NRXHR5rVwxWRua6IkbnPuj81rkftzIdD3OiO9D+iUyHPujrNxEWJv8SRxmgQ6oLwxZnQZL 7wJqIaVC /vuyTBySvam9h4Rgvy3gFVd8ZEGHcHjtB2EMWaEI//ETWyRIsU4dvdcwLovURKPlmuACCh7sw2NreIgQ1W1xRTRPR+4F63y0J+F3IWbRpgvYBNK5AFiA+rddeVCMZMgyEzfNgC0rg8ZRY3Wcaswx4hs7I2Pm1WjpjRN12qzYKInykQyR3DTviXF5NQ7026Y7CXYWRvUXYbaqWtKbb8KgfihP7dC8DiACZyEdNcuINC5r4vo4Ixy78QgkdRJs2iowf304O7DBiVMd64ffTZjbVBfeXg9bBNCKDzZYSTorD++QStNKu9I47a5XjHA2ZTFWTBligM1Kl2+6gLLEYZzpk40RHVB/ScqwfR45mxWxSz/+w9aS5YGXwTln4xrQh+h3cQqYN2HG3QjoctQI= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Mon, Dec 02, 2024 at 04:23:27PM -0800, Yuanchu Xie wrote: > Pvmemcontrol provides a way for the guest to control its physical memory > properties and enables optimizations and security features. For example, > the guest can provide information to the host where parts of a hugepage > may be unbacked, or sensitive data may not be swapped out, etc. > > Pvmemcontrol allows guests to manipulate its gPTE entries in the SLAT, > and also some other properties of the memory mapping on the host. > This is achieved by using the KVM_CAP_SYNC_MMU capability. When this > capability is available, the changes in the backing of the memory region > on the host are automatically reflected into the guest. For example, an > mmap() or madvise() that affects the region will be made visible > immediately. > > There are two components of the implementation: the guest Linux driver > and Virtual Machine Monitor (VMM) device. A guest-allocated shared > buffer is negotiated per-cpu through a few PCI MMIO registers; the VMM > device assigns a unique command for each per-cpu buffer. The guest > writes its pvmemcontrol request in the per-cpu buffer, then writes the > corresponding command into the command register, calling into the VMM > device to perform the pvmemcontrol request. > > The synchronous per-cpu shared buffer approach avoids the kick and busy > waiting that the guest would have to do with virtio virtqueue transport. > > User API > >From the userland, the pvmemcontrol guest driver is controlled via the > ioctl(2) call. It requires CAP_SYS_ADMIN. > > ioctl(fd, PVMEMCONTROL_IOCTL, struct pvmemcontrol_buf *buf); > > Guest userland applications can tag VMAs and guest hugepages, or advise > the host on how to handle sensitive guest pages. > > Supported function codes and their use cases: > PVMEMCONTROL_FREE/REMOVE/DONTNEED/PAGEOUT. For the guest. One can reduce > the struct page and page table lookup overhead by using hugepages backed > by smaller pages on the host. These pvmemcontrol commands can allow for > partial freeing of private guest hugepages to save memory. They also > allow kernel memory, such as kernel stacks and task_structs to be > paravirtualized if we expose kernel APIs. > > PVMEMCONTROL_MERGEABLE can inform the host KSM to deduplicate VM pages. > > PVMEMCONTROL_UNMERGEABLE is useful for security, when the VM does not > want to share its backing pages. > The same with PVMEMCONTROL_DONTDUMP, so sensitive pages are not included > in a dump. > MLOCK/UNLOCK can advise the host that sensitive information is not > swapped out on the host. > > PVMEMCONTROL_MPROTECT_NONE/R/W/RW. For guest stacks backed by hugepages, > stack guard pages can be handled in the host and memory can be saved in > the hugepage. > > PVMEMCONTROL_SET_VMA_ANON_NAME is useful for observability and debugging > how guest memory is being mapped on the host. > > Sample program making use of PVMEMCONTROL_DONTNEED: > https://github.com/Dummyc0m/pvmemcontrol-user > > The VMM implementation is part of Cloud Hypervisor, the feature > pvmemcontrol can be enabled and the VMM can then provide the device to a > supporting guest. > https://github.com/cloud-hypervisor/cloud-hypervisor > > Signed-off-by: Yuanchu Xie > > --- > PATCH v4 -> v5 > - use drvdata and friends to enable multiple devices And now you are "burning" a whole major number for this, right? Why not just use a misc device for every individual one? That should be much simpler and take away a lot of the generic code you have added here (your class structure, your major/minor number handling, etc.) > PATCH v3 -> v4 > - changed dev_info to dev_dbg so the driver is quiet when it works > properly. > - Edited the changelog section to be included in the diffstat. > > PATCH v2 -> v3 > - added PVMEMCONTROL_MERGEABLE for memory dedupe. > - updated link to the upstream Cloud Hypervisor repo, and specify the > feature required to enable the device. > > PATCH v1 -> v2 > - fixed byte order sparse warning. ioread/write already does > little-endian. > - add include for linux/percpu.h > > RFC v1 -> PATCH v1 > - renamed memctl to pvmemcontrol > - defined device endianness as little endian > > v1: > https://lore.kernel.org/linux-mm/20240518072422.771698-1-yuanchu@google.com/ > v2: > https://lore.kernel.org/linux-mm/20240612021207.3314369-1-yuanchu@google.com/ > v3: > https://lore.kernel.org/linux-mm/20241016193947.48534-1-yuanchu@google.com/ > v4: > https://lore.kernel.org/linux-mm/20241021204849.1580384-1-yuanchu@google.com/ > > .../userspace-api/ioctl/ioctl-number.rst | 2 + > drivers/virt/Kconfig | 2 + > drivers/virt/Makefile | 1 + > drivers/virt/pvmemcontrol/Kconfig | 10 + > drivers/virt/pvmemcontrol/Makefile | 2 + > drivers/virt/pvmemcontrol/pvmemcontrol.c | 499 ++++++++++++++++++ Why a whole subdirectory for just one .c file? Why not put it in drivers/virt/ ? thanks, greg k-h