From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 73C9ED7494D for ; Tue, 29 Oct 2024 23:06:11 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 994D66B0085; Tue, 29 Oct 2024 19:06:10 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 945696B0088; Tue, 29 Oct 2024 19:06:10 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 80DC26B0089; Tue, 29 Oct 2024 19:06:10 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 65B096B0085 for ; Tue, 29 Oct 2024 19:06:10 -0400 (EDT) Received: from smtpin17.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 1FC84407AD for ; Tue, 29 Oct 2024 23:06:10 +0000 (UTC) X-FDA: 82728173178.17.076FA5B Received: from mx0a-0031df01.pphosted.com (mx0a-0031df01.pphosted.com [205.220.168.131]) by imf01.hostedemail.com (Postfix) with ESMTP id 1759640004 for ; Tue, 29 Oct 2024 23:05:47 +0000 (UTC) Authentication-Results: imf01.hostedemail.com; dkim=pass header.d=quicinc.com header.s=qcppdkim1 header.b=mCQKQDO9; dmarc=pass (policy=none) header.from=quicinc.com; spf=pass (imf01.hostedemail.com: domain of quic_eberman@quicinc.com designates 205.220.168.131 as permitted sender) smtp.mailfrom=quic_eberman@quicinc.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1730243038; a=rsa-sha256; cv=none; b=xmzp/oNQZWOCVNMp90wNh4rX1BJbM+3k1zoeNN0vtYPdXReDRpISgNTi1wSadqRaKXzkFk /D00cHPX8oShOJFJ1kky1bIIzIkOu3onb68f+M538PuLOxeSBSYcY2ZsukYw+8wn1HKBiG xiUfOKsMHfT+qdxFzJxSNYtqVRr7LSw= ARC-Authentication-Results: i=1; imf01.hostedemail.com; dkim=pass header.d=quicinc.com header.s=qcppdkim1 header.b=mCQKQDO9; dmarc=pass (policy=none) header.from=quicinc.com; spf=pass (imf01.hostedemail.com: domain of quic_eberman@quicinc.com designates 205.220.168.131 as permitted sender) smtp.mailfrom=quic_eberman@quicinc.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1730243038; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=XvPjFHvm1+HBenMPEFmypiCzDsR7VO/a8uKBoMokM3U=; b=m19upu1j6W8IpYqVbttFAVxa/JBXYPuJXGz7KcURWJi5yW1SssoI7GAEV0EaAg/dlM7+fd L9IdgaqWM9F6lt9LCDj9JA3nABha6De9YiRQTKSakbSSwW8Gh/AMs0oRh9uBftR1n4HwRF hj7Gy3eSdwZTdb7KIRyjGmudnmG+rjI= Received: from pps.filterd (m0279866.ppops.net [127.0.0.1]) by mx0a-0031df01.pphosted.com (8.18.1.2/8.18.1.2) with ESMTP id 49TLjePI025437; Tue, 29 Oct 2024 23:05:56 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=quicinc.com; h= cc:content-type:date:from:in-reply-to:message-id:mime-version :references:subject:to; s=qcppdkim1; bh=XvPjFHvm1+HBenMPEFmypiCz DsR7VO/a8uKBoMokM3U=; b=mCQKQDO9bf3KfjOJU9ltheoQ4pi0fn1YYss4JVAT q1v+p9+u9ONpdRk7Qgy5XsZ3UXOhgNR2wlEUyIZJiTATUwupgdqCX8w2AzPTkQHD o7My0k7jrEgNK/E8DCtuwASBBQvKrb0z5dzki/3JQPD86QZg/Jx7q7OyRcpYlhX4 kL/qq7w+YWQzddaE2fHnJ3F7FG6kbEWr7MwYfkkeOIYh1k/UFHNc7Hezg+xvrhXG 9BAKuIplcwNjNJp/KZs98qV/cW4m01e13kPJS0HhNvYCtdWhMLpAjvVvjd87NFVK tTp/G4ONtnEkJgL68a8jrsECbbgSoCHa4JrFK6nSf6Jnpw== Received: from nasanppmta01.qualcomm.com (i-global254.qualcomm.com [199.106.103.254]) by mx0a-0031df01.pphosted.com (PPS) with ESMTPS id 42gsq8hwyc-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 29 Oct 2024 23:05:56 +0000 (GMT) Received: from nasanex01b.na.qualcomm.com (nasanex01b.na.qualcomm.com [10.46.141.250]) by NASANPPMTA01.qualcomm.com (8.18.1.2/8.18.1.2) with ESMTPS id 49TN5tQ9001175 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 29 Oct 2024 23:05:55 GMT Received: from hu-eberman-lv.qualcomm.com (10.49.16.6) by nasanex01b.na.qualcomm.com (10.46.141.250) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1544.9; Tue, 29 Oct 2024 16:05:54 -0700 Date: Tue, 29 Oct 2024 16:05:54 -0700 From: Elliot Berman To: James Gowans CC: , Sean Christopherson , Paolo Bonzini , Alexander Viro , Steve Sistare , Christian Brauner , Jan Kara , Anthony Yznaga , Mike Rapoport , Andrew Morton , , Jason Gunthorpe , , Usama Arif , , Alexander Graf , David Woodhouse , Paul Durrant , Nicolas Saenz Julienne Subject: Re: [PATCH 05/10] guestmemfs: add file mmap callback Message-ID: <20241029120232032-0700.eberman@hu-eberman-lv.qualcomm.com> References: <20240805093245.889357-1-jgowans@amazon.com> <20240805093245.889357-6-jgowans@amazon.com> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Disposition: inline In-Reply-To: <20240805093245.889357-6-jgowans@amazon.com> X-Originating-IP: [10.49.16.6] X-ClientProxiedBy: nalasex01b.na.qualcomm.com (10.47.209.197) To nasanex01b.na.qualcomm.com (10.46.141.250) X-QCInternal: smtphost X-Proofpoint-Virus-Version: vendor=nai engine=6200 definitions=5800 signatures=585085 X-Proofpoint-GUID: YHT3SO5zWSaOsoLiIh9WLH0BmLD6E99w X-Proofpoint-ORIG-GUID: YHT3SO5zWSaOsoLiIh9WLH0BmLD6E99w X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1039,Hydra:6.0.680,FMLib:17.12.60.29 definitions=2024-09-06_09,2024-09-06_01,2024-09-02_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 phishscore=0 bulkscore=0 adultscore=0 mlxscore=0 priorityscore=1501 mlxlogscore=999 impostorscore=0 lowpriorityscore=0 malwarescore=0 clxscore=1011 spamscore=0 suspectscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.19.0-2409260000 definitions=main-2410290175 X-Rspamd-Server: rspam12 X-Rspamd-Queue-Id: 1759640004 X-Stat-Signature: 8atc18azp5s6iw9cyz3wuqkjaba88dbx X-Rspam-User: X-HE-Tag: 1730243147-581499 X-HE-Meta: U2FsdGVkX19lc7X/Obb7HNDqj7jcc9OuRZL/2UFTF4VEk20CZQJbNqnp4WZWr7JjdF9b0jlcjJsHBjGwh2jwuepGtOPk3lOSL5FwJPQ3dT9h5f+Qj34vOY2aXlew77YwQUAyH2gw9O3zE5X7jDuHOdAw6tVNtSx71N0PA21SPxJ5Uk0Armq3U5bJ27jaxajHfntu9muqREnT9FKhMZgCBGgD3lYuwRGKR0YJoJG/vx9UkiovaNxXKNBOWWtOpA3BijACO5J6xJToj6phkQAP6a9RvGJ/4gyx/NLGwEAaDsBCjFABde3HPa1SKhXrFcmsAGuVadb8BPyP3bag4YaWOQzALk6XBqD98g6ZHJ3/AWArIZO5KKkEWabeswumnnDq+9lCQRHyb5qC3T64C0AMjy0mMXaESkVM0uUI/Hgzj0adCERq5QZ1zbt9oJ9wQmSdS8ODr0xA56MiCZeU5pKpoM8pmPDbqlJkrRzTSzf8ZZ7FpVoPol5mZB1l3aEwF1fMhV+A5+MzfgvygU49E+qrmjbi+RhtufDDrbczWBwAfL6CJr8FY58xCcI+G9LV4HuD+kH0U5y+/sfJK5aEP2S+DPDtP/kbEX8G2Tkv7eBOA6yACbsnRWp3xNwxKT0wow9vU3LdefajvpPoYxDYl3POOteYaNTJCLF3GYOuXO91CS2CG6YPxpmn2zSiTjfjTiq0HJdBdigP8ib7vfAh4ITTX008vhtvq4rJea5wR5yieUwPxcOjopfO6DlguRxe8WNL2Sf3hpkaYP4zmZO9xY1LiiqEM1pmmJo3aIhFlg/I1sqTvaeYZYuTif3q7/fFI+mh50SLA/p7F0u7qUH6Z2zJQq2YYFJ45PPb9uTJsIGHmgYeObtN01X41SRLhU8YQUF4I7pFIRjRSmQEZD/Bguyz/JAAPEsY1qwS5Do0Ckj9TgqhgdzyQXr1bVXtVIDYI5fQpQUzwqC9B386to/dsD0 r3gm42sX OZQUl9OYYZfKP2FU2TSA4yYcPNRr91iMZmtQgCJwI3+LAmL8dfF4S/mSh9yMU+Fm57RCXkan7A3hthgoJVJk1SIKcQxyfIAi0CF0EBdMiwOHdcAfjFTSZBGamdQvnLAQ/C/JyJmYi91lo7tiEkkHMXJtRGmsNgE8pLE3IsVRYJxncMmreQfMR0QJVgIO9z+hyIQTP8g/7NBeITBnMkJQMLB8ub0/KRHL2/3v7+bdr2NCqNFWu6PJF10dcyjFTR+B35BfSWErQZ4nfXyqjWJ0gsVTfqwXYUzOioOAFjVRIM7OGE/fSI1jCFrV7/r4hjkJ44N4lY7y8xrJFzJg= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Mon, Aug 05, 2024 at 11:32:40AM +0200, James Gowans wrote: > Make the file data usable to userspace by adding mmap. That's all that > QEMU needs for guest RAM, so that's all be bother implementing for now. > > When mmaping the file the VMA is marked as PFNMAP to indicate that there > are no struct pages for the memory in this VMA. Remap_pfn_range() is > used to actually populate the page tables. All PTEs are pre-faulted into > the pgtables at mmap time so that the pgtables are usable when this > virtual address range is given to VFIO's MAP_DMA. Thanks for sending this out! I'm going through the series with the intention to see how it might fit within the existing guest_memfd work for pKVM/CoCo/Gunyah. It might've been mentioned in the MM alignment session -- you might be interested to join the guest_memfd bi-weekly call to see how we are overlapping [1]. [1]: https://lore.kernel.org/kvm/ae794891-fe69-411a-b82e-6963b594a62a@redhat.com/T/ --- Was the decision to pre-fault everything because it was convenient to do or otherwise intentionally different from hugetlb? > > Signed-off-by: James Gowans > --- > fs/guestmemfs/file.c | 43 +++++++++++++++++++++++++++++++++++++- > fs/guestmemfs/guestmemfs.c | 2 +- > fs/guestmemfs/guestmemfs.h | 3 +++ > 3 files changed, 46 insertions(+), 2 deletions(-) > > diff --git a/fs/guestmemfs/file.c b/fs/guestmemfs/file.c > index 618c93b12196..b1a52abcde65 100644 > --- a/fs/guestmemfs/file.c > +++ b/fs/guestmemfs/file.c > @@ -1,6 +1,7 @@ > // SPDX-License-Identifier: GPL-2.0-only > > #include "guestmemfs.h" > +#include > > static int truncate(struct inode *inode, loff_t newsize) > { > @@ -41,6 +42,46 @@ static int inode_setattr(struct mnt_idmap *idmap, struct dentry *dentry, struct > return 0; > } > > +/* > + * To be able to use PFNMAP VMAs for VFIO DMA mapping we need the page tables > + * populated with mappings. Pre-fault everything. > + */ > +static int mmap(struct file *filp, struct vm_area_struct *vma) > +{ > + int rc; > + unsigned long *mappings_block; > + struct guestmemfs_inode *guestmemfs_inode; > + > + guestmemfs_inode = guestmemfs_get_persisted_inode(filp->f_inode->i_sb, > + filp->f_inode->i_ino); > + > + mappings_block = guestmemfs_inode->mappings; > + > + /* Remap-pfn-range will mark the range VM_IO */ > + for (unsigned long vma_addr_offset = vma->vm_start; > + vma_addr_offset < vma->vm_end; > + vma_addr_offset += PMD_SIZE) { > + int block, mapped_block; > + unsigned long map_size = min(PMD_SIZE, vma->vm_end - vma_addr_offset); > + > + block = (vma_addr_offset - vma->vm_start) / PMD_SIZE; > + mapped_block = *(mappings_block + block); > + /* > + * It's wrong to use rempa_pfn_range; this will install PTE-level entries. > + * The whole point of 2 MiB allocs is to improve TLB perf! > + * We should use something like mm/huge_memory.c#insert_pfn_pmd > + * but that is currently static. > + * TODO: figure out the best way to install PMDs. > + */ > + rc = remap_pfn_range(vma, > + vma_addr_offset, > + (guestmemfs_base >> PAGE_SHIFT) + (mapped_block * 512), > + map_size, > + vma->vm_page_prot); > + } > + return 0; > +} > + > const struct inode_operations guestmemfs_file_inode_operations = { > .setattr = inode_setattr, > .getattr = simple_getattr, > @@ -48,5 +89,5 @@ const struct inode_operations guestmemfs_file_inode_operations = { > > const struct file_operations guestmemfs_file_fops = { > .owner = THIS_MODULE, > - .iterate_shared = NULL, > + .mmap = mmap, > }; > diff --git a/fs/guestmemfs/guestmemfs.c b/fs/guestmemfs/guestmemfs.c > index c45c796c497a..38f20ad25286 100644 > --- a/fs/guestmemfs/guestmemfs.c > +++ b/fs/guestmemfs/guestmemfs.c > @@ -9,7 +9,7 @@ > #include > #include > > -static phys_addr_t guestmemfs_base, guestmemfs_size; > +phys_addr_t guestmemfs_base, guestmemfs_size; > struct guestmemfs_sb *psb; > > static int statfs(struct dentry *root, struct kstatfs *buf) > diff --git a/fs/guestmemfs/guestmemfs.h b/fs/guestmemfs/guestmemfs.h > index 7ea03ac8ecca..0f2788ce740e 100644 > --- a/fs/guestmemfs/guestmemfs.h > +++ b/fs/guestmemfs/guestmemfs.h > @@ -8,6 +8,9 @@ > #define GUESTMEMFS_FILENAME_LEN 255 > #define GUESTMEMFS_PSB(sb) ((struct guestmemfs_sb *)sb->s_fs_info) > > +/* Units of bytes */ > +extern phys_addr_t guestmemfs_base, guestmemfs_size; > + > struct guestmemfs_sb { > /* Inode number */ > unsigned long next_free_ino; > -- > 2.34.1 > >