From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7749BEB64D8 for ; Thu, 22 Jun 2023 07:53:11 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 7B12B8D0005; Thu, 22 Jun 2023 03:53:10 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 711A58D0001; Thu, 22 Jun 2023 03:53:10 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 4053F8D0005; Thu, 22 Jun 2023 03:53:10 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 289588D0001 for ; Thu, 22 Jun 2023 03:53:10 -0400 (EDT) Received: from smtpin14.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id E6A2440AE6 for ; Thu, 22 Jun 2023 07:53:09 +0000 (UTC) X-FDA: 80929618098.14.1AFC5B0 Received: from mga11.intel.com (mga11.intel.com [192.55.52.93]) by imf21.hostedemail.com (Postfix) with ESMTP id DF06A1C0009 for ; Thu, 22 Jun 2023 07:53:07 +0000 (UTC) Authentication-Results: imf21.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=lAK5Pk26; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf21.hostedemail.com: domain of vivek.kasireddy@intel.com designates 192.55.52.93 as permitted sender) smtp.mailfrom=vivek.kasireddy@intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1687420388; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=94M20aYObvZlV0wBnwFhUQim2SAAgm+ESfgPTFPNL/Q=; b=0W9duGSWulOFuin/5f3/gXixaCkccrWdaBG1EamSorTF4wvmcMKtACM7i+DNmOhR9cqTa1 fzZLKFs0bTA2L2xBw/E5SJwM/5rF7oksa4fJGTzES/sA1nRPDWC57Y1sDeaVdVugYrBSM8 OiGdC/P3WG3QiFiC5EfN58B5Acd09Cc= ARC-Authentication-Results: i=1; imf21.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=lAK5Pk26; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf21.hostedemail.com: domain of vivek.kasireddy@intel.com designates 192.55.52.93 as permitted sender) smtp.mailfrom=vivek.kasireddy@intel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1687420388; a=rsa-sha256; cv=none; b=4zOHvg1Et2ifThf2LzHn7iFiBdK0PknVuDoekTFkRAf2vTiwZALIbtOdJ2TFY+FjBRuB8o jXDeFOSdYJXQ4EGH2Ffpo40Pj+vjFu+AtXDbsidv39uzcUmcsk69gsNUpx5D37k65aHegr wfVLsZqD80RnvGBWNgSiAOmH+p8ypZ8= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1687420388; x=1718956388; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=smxqWtzF1UgM4ssvt0u5gPTlMBH+zFFcGG2SNZirXiQ=; b=lAK5Pk26F86RY6PMoSWgvRAZsGYB27u0do5GFMceaz5bbKpIU3UgAbtL hPgnGVTZGObFMMCTRHmkCpdG7+oTJrUhgJgylgpyxCSsFKMbzvtuKkRAk BtK47iXuMxX+GDdnqv+oFoLDgLkwtdvzarLhbx1JkN9dhbbu5dxKZKz0l hKKv0r9Xj0yusQ650bzNq9S4/BAI0hfibHUau+c1JtaE2glb3LYxucAe3 EMMBP+k8TWduo4X+KkKsSOdQa1kmpfb0t9bZrmZoX2npbG7HROYRqUEee tLi6rk2jB5E43iVTmHmId7cwXXuKkP+1uGKsICSGRnbmfD0U/5tF2+ugm Q==; X-IronPort-AV: E=McAfee;i="6600,9927,10748"; a="357910364" X-IronPort-AV: E=Sophos;i="6.00,263,1681196400"; d="scan'208";a="357910364" Received: from fmsmga004.fm.intel.com ([10.253.24.48]) by fmsmga102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 22 Jun 2023 00:52:25 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10748"; a="784801074" X-IronPort-AV: E=Sophos;i="6.00,263,1681196400"; d="scan'208";a="784801074" Received: from vkasired-desk2.fm.intel.com ([10.105.128.127]) by fmsmga004-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 22 Jun 2023 00:52:24 -0700 From: Vivek Kasireddy To: dri-devel@lists.freedesktop.org, linux-mm@kvack.org Cc: Vivek Kasireddy , Mike Kravetz , David Hildenbrand , Gerd Hoffmann , Dongwon Kim , Andrew Morton , James Houghton , Jerome Marchand , Junxiao Chang , "Kirill A . Shutemov" , Michal Hocko , Muchun Song Subject: [PATCH v1 2/2] udmabuf: Add back support for mapping hugetlb pages Date: Thu, 22 Jun 2023 00:27:10 -0700 Message-Id: <20230622072710.3707315-3-vivek.kasireddy@intel.com> X-Mailer: git-send-email 2.39.2 In-Reply-To: <20230622072710.3707315-1-vivek.kasireddy@intel.com> References: <20230622072710.3707315-1-vivek.kasireddy@intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Rspam-User: X-Stat-Signature: sx7cph6wodj6jr6bopi7f4ze114y8p7u X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: DF06A1C0009 X-HE-Tag: 1687420387-127353 X-HE-Meta: U2FsdGVkX1+WiagOP3anNutP7pUKKDqWGHisdICEAilohHPOlCN0JlhGLnQcPZe1kBT6EMflyZc5Ml/HdcNa4ZrrNQjVyBj4BcisjkTjgJjfJqrFlmc8wQKytMLGjoYQl73g4fL91LlpyPgnSOrlwcpaTAVcIjUlTom2HQ8isVwfFyhwNhwEfyOGVJOcU4jRkc+zAezzFIzJtZbvFTzQGq7S571x0EdAzyEK0BjI6UYN10JqNOMZkKUMfe4CY2Mc07/7UjPjU2QJeiRLT6ZxIacYdCOY9x2JjQzAMKpkKmwiTzLqVQwibkEScZfjtn+PzGpIIhFulWoCjNqIMNojX30YotYI70GRvEGI/aquso55SwiOrY2HvYPBimD865SfOrAErfnZkyad8k3t3jd4s+Ni0WYD1RYxI0ztYumfSRj4jOW54SRVO9cLa+ufi90tDP9j9+C+uimU5ANw2xsK/C6s5+CQL/bGHClk2AaFeYczt3lblRIw1kyp6UQMcx3AGu7td6qunS4p+KnD75ejzSqLYknfejoK5gTcBn7v3tVDjfo9dpUwWcDFGh36W2mafZaafUOSD8WMewk6l1eiheGCM43R5PvNjpH2sQLV+ImNB1hMaDFEJO6JbYqB7VLpSZEn991nmZIDNpS7DUfi72wDxoRlfToj4T1WBAzGra9+Ikwb2CNpCB1t4A6hR8yjhzkHbM3nXJFDkle3D0WmOT6WrygXaw/vhHuzfHmXB/YA0NGKkm7UuKKCR4BbWPFDcVIAK1sk/erEGq6Ijy8/Dw0fbfLnSdc337JdIuXkUyZZNSO/2/e3VyxTnMs5/tBF3uPd8ezKooWKF1t0wgJ3LR7seZISIcE7zfJJZkRRrNAl+hZSa9l0rJGPG1PHiYOMWPhFXBjRwDPeWv6xIFLYN54wtIhbxD/8R3GtBxwAXgFYSLsFG838CAcnlSswQ47bYK40YdIk4yaP6Empc5w f7kgrKqC 5DQW/C2MwMp7Ls/Nwn4lUFWWiRR9R2NaKqYKFI26JTCYlhCudnoRG3cwzpunR8PedSpmIZ2x/SRHeYbxFcmn871Mp3eCpjd7drIq7bcyQZTWaxkkrtcuFQrS7uLuS88ci9fwil2QnBB0yylEXSyPLurNqvGYjIPDttIELKKdO5MPPGNBYwr895Dusum4EdJhhhO8JtnA/lFqmDqX2Z8One3lygKH8oqvsyR84mdrW/4wG32i2mpScr8+SuxcwADqEXs+o9s8A5YgdhR34Sh25tsmsqsz++8yqzYoNtpD5l2xPLxkN0shvVJmEclltd1q1W0iYF5hFnCDWWeihBRnK36RTIw== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: A user or admin can configure a VMM (Qemu) Guest's memory to be backed by hugetlb pages for various reasons. However, a Guest OS would still allocate (and pin) buffers that are backed by regular 4k sized pages. In order to map these buffers and create dma-bufs for them on the Host, we first need to find the hugetlb pages where the buffer allocations are located and then determine the offsets of individual chunks (within those pages) and use this information to eventually populate a scatterlist. Testcase: default_hugepagesz=2M hugepagesz=2M hugepages=2500 options were passed to the Host kernel and Qemu was launched with these relevant options: qemu-system-x86_64 -m 4096m.... -device virtio-gpu-pci,max_outputs=1,blob=true,xres=1920,yres=1080 -display gtk,gl=on -object memory-backend-memfd,hugetlb=on,id=mem1,size=4096M -machine memory-backend=mem1 Replacing -display gtk,gl=on with -display gtk,gl=off above would exercise the mmap handler. Cc: Mike Kravetz Cc: David Hildenbrand Cc: Gerd Hoffmann Cc: Dongwon Kim Cc: Andrew Morton Cc: James Houghton Cc: Jerome Marchand Cc: Junxiao Chang Cc: Kirill A. Shutemov Cc: Michal Hocko Cc: Muchun Song Signed-off-by: Vivek Kasireddy --- drivers/dma-buf/udmabuf.c | 97 +++++++++++++++++++++++++++++++++------ 1 file changed, 83 insertions(+), 14 deletions(-) diff --git a/drivers/dma-buf/udmabuf.c b/drivers/dma-buf/udmabuf.c index 6de40c51d895..0ae44bce01e7 100644 --- a/drivers/dma-buf/udmabuf.c +++ b/drivers/dma-buf/udmabuf.c @@ -10,6 +10,7 @@ #include #include #include +#include #include #include #include @@ -28,6 +29,7 @@ struct udmabuf { struct page **pages; struct sg_table *sg; struct miscdevice *device; + pgoff_t *offsets; }; static vm_fault_t udmabuf_vm_fault(struct vm_fault *vmf) @@ -41,6 +43,10 @@ static vm_fault_t udmabuf_vm_fault(struct vm_fault *vmf) return VM_FAULT_SIGBUS; pfn = page_to_pfn(ubuf->pages[pgoff]); + if (ubuf->offsets) { + pfn += ubuf->offsets[pgoff] >> PAGE_SHIFT; + } + return vmf_insert_pfn(vma, vmf->address, pfn); } @@ -92,23 +98,40 @@ static struct sg_table *get_sg_table(struct device *dev, struct dma_buf *buf, { struct udmabuf *ubuf = buf->priv; struct sg_table *sg; + struct scatterlist *sgl; + unsigned long i = 0; int ret; sg = kzalloc(sizeof(*sg), GFP_KERNEL); if (!sg) return ERR_PTR(-ENOMEM); - ret = sg_alloc_table_from_pages(sg, ubuf->pages, ubuf->pagecount, - 0, ubuf->pagecount << PAGE_SHIFT, - GFP_KERNEL); - if (ret < 0) - goto err; + + if (ubuf->offsets) { + ret = sg_alloc_table(sg, ubuf->pagecount, GFP_KERNEL); + if (ret < 0) + goto err_alloc; + + for_each_sg(sg->sgl, sgl, ubuf->pagecount, i) { + sg_set_page(sgl, ubuf->pages[i], PAGE_SIZE, + ubuf->offsets[i]); + } + } else { + ret = sg_alloc_table_from_pages(sg, ubuf->pages, + ubuf->pagecount, 0, + ubuf->pagecount << PAGE_SHIFT, + GFP_KERNEL); + if (ret < 0) + goto err_alloc; + + } ret = dma_map_sgtable(dev, sg, direction, 0); if (ret < 0) - goto err; + goto err_map; return sg; -err: +err_map: sg_free_table(sg); +err_alloc: kfree(sg); return ERR_PTR(ret); } @@ -145,6 +168,8 @@ static void release_udmabuf(struct dma_buf *buf) for (pg = 0; pg < ubuf->pagecount; pg++) put_page(ubuf->pages[pg]); + if (ubuf->offsets) + kfree(ubuf->offsets); kfree(ubuf->pages); kfree(ubuf); } @@ -208,7 +233,9 @@ static long udmabuf_create(struct miscdevice *device, struct udmabuf *ubuf; struct dma_buf *buf; pgoff_t pgoff, pgcnt, pgidx, pgbuf = 0, pglimit; - struct page *page; + struct page *page, *hpage = NULL; + pgoff_t hpoff, chunkoff, maxchunks; + struct hstate *hpstate; int seals, ret = -EINVAL; u32 i, flags; @@ -244,7 +271,7 @@ static long udmabuf_create(struct miscdevice *device, if (!memfd) goto err; mapping = memfd->f_mapping; - if (!shmem_mapping(mapping)) + if (!shmem_mapping(mapping) && !is_file_hugepages(memfd)) goto err; seals = memfd_fcntl(memfd, F_GET_SEALS, 0); if (seals == -EINVAL) @@ -255,16 +282,56 @@ static long udmabuf_create(struct miscdevice *device, goto err; pgoff = list[i].offset >> PAGE_SHIFT; pgcnt = list[i].size >> PAGE_SHIFT; + if (is_file_hugepages(memfd)) { + if (!ubuf->offsets) { + ubuf->offsets = kmalloc_array(ubuf->pagecount, + sizeof(*ubuf->offsets), + GFP_KERNEL); + if (!ubuf->offsets) { + ret = -ENOMEM; + goto err; + } + } + hpstate = hstate_file(memfd); + hpoff = list[i].offset >> huge_page_shift(hpstate); + chunkoff = (list[i].offset & + ~huge_page_mask(hpstate)) >> PAGE_SHIFT; + maxchunks = huge_page_size(hpstate) >> PAGE_SHIFT; + } for (pgidx = 0; pgidx < pgcnt; pgidx++) { - page = shmem_read_mapping_page(mapping, pgoff + pgidx); - if (IS_ERR(page)) { - ret = PTR_ERR(page); - goto err; + if (is_file_hugepages(memfd)) { + if (!hpage) { + hpage = find_get_page_flags(mapping, hpoff, + FGP_ACCESSED); + if (!hpage) { + ret = -EINVAL; + goto err; + } + } + get_page(hpage); + ubuf->pages[pgbuf] = hpage; + ubuf->offsets[pgbuf++] = chunkoff << PAGE_SHIFT; + if (++chunkoff == maxchunks) { + put_page(hpage); + hpage = NULL; + chunkoff = 0; + hpoff++; + } + } else { + page = shmem_read_mapping_page(mapping, pgoff + pgidx); + if (IS_ERR(page)) { + ret = PTR_ERR(page); + goto err; + } + ubuf->pages[pgbuf++] = page; } - ubuf->pages[pgbuf++] = page; } fput(memfd); memfd = NULL; + if (hpage) { + put_page(hpage); + hpage = NULL; + } } exp_info.ops = &udmabuf_ops; @@ -289,6 +356,8 @@ static long udmabuf_create(struct miscdevice *device, put_page(ubuf->pages[--pgbuf]); if (memfd) fput(memfd); + if (ubuf->offsets) + kfree(ubuf->offsets); kfree(ubuf->pages); kfree(ubuf); return ret; -- 2.39.2