From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 416BBC3DA4A for ; Mon, 19 Aug 2024 13:14:36 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id C6D1C6B0089; Mon, 19 Aug 2024 09:14:35 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id C1CC16B008A; Mon, 19 Aug 2024 09:14:35 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B0B946B0092; Mon, 19 Aug 2024 09:14:35 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 913696B0089 for ; Mon, 19 Aug 2024 09:14:35 -0400 (EDT) Received: from smtpin19.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 4B3CE1612E6 for ; Mon, 19 Aug 2024 13:14:35 +0000 (UTC) X-FDA: 82469039310.19.C01E67D Received: from szxga05-in.huawei.com (szxga05-in.huawei.com [45.249.212.191]) by imf24.hostedemail.com (Postfix) with ESMTP id 910C4180015 for ; Mon, 19 Aug 2024 13:14:32 +0000 (UTC) Authentication-Results: imf24.hostedemail.com; dkim=none; spf=pass (imf24.hostedemail.com: domain of wangkefeng.wang@huawei.com designates 45.249.212.191 as permitted sender) smtp.mailfrom=wangkefeng.wang@huawei.com; dmarc=pass (policy=quarantine) header.from=huawei.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1724073212; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=VMS7vyPOD1w3XGo5EFn67TJJwOsp6W1Y92rgoFNzXWI=; b=u1DI/4Pq6FGATDdyaDDE4f1x393NPYKgNQrYYx1DMEtfSB0bMeir72fY26puK56zyHX1Bb 71xO7yW2iTKZnzvPA9LI6OnB/JS8IObyQAE+opLY9SMog/ucdW3qdpR94oy/Kb2dztliug auoSUirCOtn6YB7X/lnzmIJ4NQROvMY= ARC-Authentication-Results: i=1; imf24.hostedemail.com; dkim=none; spf=pass (imf24.hostedemail.com: domain of wangkefeng.wang@huawei.com designates 45.249.212.191 as permitted sender) smtp.mailfrom=wangkefeng.wang@huawei.com; dmarc=pass (policy=quarantine) header.from=huawei.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1724073212; a=rsa-sha256; cv=none; b=vZKmX7qIukY3R2a0xN1HjgFIsgkJv4z6/+OGNkNc/6H2LaYXQ8EXkbMIkTuHxyI6sFNyjc i406h+Nol3LMVAV5CWzXdhKs9HcWJwk8RTquXcTW0ApkaRBv428vI76ClsMLRmfLtvNoj8 W4dG7VG9J56/mliKViQ6YiDoS6+DB3E= Received: from mail.maildlp.com (unknown [172.19.88.234]) by szxga05-in.huawei.com (SkyGuard) with ESMTP id 4WnXxP422Nz1j6lw; Mon, 19 Aug 2024 21:09:29 +0800 (CST) Received: from dggpemf100008.china.huawei.com (unknown [7.185.36.138]) by mail.maildlp.com (Postfix) with ESMTPS id E97721402E2; Mon, 19 Aug 2024 21:14:27 +0800 (CST) Received: from [10.174.177.243] (10.174.177.243) by dggpemf100008.china.huawei.com (7.185.36.138) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1544.11; Mon, 19 Aug 2024 21:14:26 +0800 Message-ID: <498e0731-81a4-4f75-95b4-a8ad0bcc7665@huawei.com> Date: Mon, 19 Aug 2024 21:14:26 +0800 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH 00/19] mm: Support huge pfnmaps Content-Language: en-US To: Peter Xu CC: Jason Gunthorpe , , , Sean Christopherson , Oscar Salvador , Axel Rasmussen , , , Will Deacon , Gavin Shan , Paolo Bonzini , Zi Yan , Andrew Morton , Catalin Marinas , Ingo Molnar , Alistair Popple , Borislav Petkov , David Hildenbrand , Thomas Gleixner , , Dave Hansen , Alex Williamson , Yan Zhao References: <20240809160909.1023470-1-peterx@redhat.com> <20240814123715.GB2032816@nvidia.com> <1147332f-790e-487f-8816-1860b8744ab2@huawei.com> From: Kefeng Wang In-Reply-To: Content-Type: text/plain; charset="UTF-8"; format=flowed Content-Transfer-Encoding: 7bit X-Originating-IP: [10.174.177.243] X-ClientProxiedBy: dggems705-chm.china.huawei.com (10.3.19.182) To dggpemf100008.china.huawei.com (7.185.36.138) X-Stat-Signature: qz5wjozs46huseq3xjeee4o5ttz5rzen X-Rspam-User: X-Rspamd-Queue-Id: 910C4180015 X-Rspamd-Server: rspam02 X-HE-Tag: 1724073272-447777 X-HE-Meta: U2FsdGVkX1+69XUWC/+GX2iViVtGyC55Vji6UHvPcpSqSkOkYmR/sSUl56WDErYuZxgLR7YDCckcAkdC5AFoLFojt/vP2U7AOLdmRaeREyMXl1Gsh5Rr82iYY+dtyUI7q1jjv07VVOuRpX5X2FSwQIXEoDsqIEKXBfqcUxjEK+NH++m6jp5plZBtn74y/mqmwCNCOtkL3/ALRNS3P3Owq+7RpDzOJY9lotsKZpmNUbfZQISYc3u8NdozDuC5mUn6/rzOcR91O+esUbxGseWGjGeReQicljrW8D1kZlsEj4XSLmhxqGhF3LaiH5m2Ryj1QYBxs5n92BieyOJF33WXAHJZquqo5z+gR/Wx/SdQqlbc24l5TYPOFzHNsgDgut0g52Uc/8lQNBDpa2rk5WFfoFp4qUEW4qRtyFsoTGdl5si56uYVk82QI+mBcA7skivnAI7co3VPcmQJL1ivQku+I/lYj6OE9bY5Y8evnGbQ1i+7x/82NhtnTUsir+SIcHr73jqj4NrysX/ZdYafa7ZyONRikAOJxfIgLtr6FIfMWnwxlpmnBHURRHp0sDsmnhxmD3SkduoMegtEGTjT3c608jVQr9lGY8xyH4vpkZs+L3ssRv8qA1MiJTe8/0CIhA5JnHpIlaRaqbe1zDA51O/95CXZ6RdTTquYxQnben6XbVbow3sydfAHES9eLAnGkh68BB3Kx3c+ZYBcT0eqbAMQ8aibvffhKV/UG8ePIR7ETQ8xGfsVOKpZjm3cApLap028IdL/yoBSt4qmRc6EkIzpfk2gfg6FeV8ZjuIlovJ6A3G/bMi2CUKZPUkwVfz2xKCfv9pCixOLhfjZ5q7Fj/P+tLiwZOSKoOOaPieFAqy7MYu3SIlKFysBI5L7NaN8fMIxcw7IPFzEC6yNqMxADtzBPT9aTYSWOz3SdOWQ/6qnDPeeg7b4RX3vPBfG2gYbmZG3sxPEvcPG7zrcI9Asxf7 RwEw/6Ee /mxixifsnQ/SXG/TsesDjuvs/GC92fjE2Otk+0kI5u8ToqqjFfcnJBWUuKMyuk7MPfCVZvKri9WOJ3m3DE7xXTNtwyJSow5HR/Cdia5UNDAp+3jbdQ3dpE+x5ax6L+izeAhC7tuoeCVZ6GL5a98P+utY1F9NrAhlUByoFErJlib4zJVfu6arS0bmJFBLDpO2c73loYCVtbqwEnBbPo2EHDSL+MNpIQ/wBSKCJZmOo/CT3PFKYnBl1pi8bw0qHr5Rh3mP6 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 2024/8/16 22:33, Peter Xu wrote: > On Fri, Aug 16, 2024 at 11:05:33AM +0800, Kefeng Wang wrote: >> >> >> On 2024/8/16 3:20, Peter Xu wrote: >>> On Wed, Aug 14, 2024 at 09:37:15AM -0300, Jason Gunthorpe wrote: >>>>> Currently, only x86_64 (1G+2M) and arm64 (2M) are supported. >>>> >>>> There is definitely interest here in extending ARM to support the 1G >>>> size too, what is missing? >>> >>> Currently PUD pfnmap relies on THP_PUD config option: >>> >>> config ARCH_SUPPORTS_PUD_PFNMAP >>> def_bool y >>> depends on ARCH_SUPPORTS_HUGE_PFNMAP && HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD >>> >>> Arm64 unfortunately doesn't yet support dax 1G, so not applicable yet. >>> >>> Ideally, pfnmap is too simple comparing to real THPs and it shouldn't >>> require to depend on THP at all, but we'll need things like below to land >>> first: >>> >>> https://lore.kernel.org/r/20240717220219.3743374-1-peterx@redhat.com >>> >>> I sent that first a while ago, but I didn't collect enough inputs, and I >>> decided to unblock this series from that, so x86_64 shouldn't be affected, >>> and arm64 will at least start to have 2M. >>> >>>> >>>>> The other trick is how to allow gup-fast working for such huge mappings >>>>> even if there's no direct sign of knowing whether it's a normal page or >>>>> MMIO mapping. This series chose to keep the pte_special solution, so that >>>>> it reuses similar idea on setting a special bit to pfnmap PMDs/PUDs so that >>>>> gup-fast will be able to identify them and fail properly. >>>> >>>> Make sense >>>> >>>>> More architectures / More page sizes >>>>> ------------------------------------ >>>>> >>>>> Currently only x86_64 (2M+1G) and arm64 (2M) are supported. >>>>> >>>>> For example, if arm64 can start to support THP_PUD one day, the huge pfnmap >>>>> on 1G will be automatically enabled. >> >> A draft patch to enable THP_PUD on arm64, only passed with DEBUG_VM_PGTABLE, >> we may test pud pfnmaps on arm64. > > Thanks, Kefeng. It'll be great if this works already, as simple. > > Might be interesting to know whether it works already if you have some > few-GBs GPU around on the systems. > > Logically as long as you have HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD selected > below, 1g pfnmap will be automatically enabled when you rebuild the kernel. > You can double check that by looking for this: > > CONFIG_ARCH_SUPPORTS_PUD_PFNMAP=y > > And you can try to observe the mappings by enabling dynamic debug for > vfio_pci_mmap_huge_fault(), then map the bar with vfio-pci and read > something from it. I don't have such device, but we write a driver which use vmf_insert_pfn_pmd/pud in huge_fault, static const struct vm_operations_struct test_vm_ops = { .huge_fault = test_huge_fault, ... } and read/write it after mmap(,2M/1G,test_fd,...), it works as expected, since it could be used by dax, let's send it separately.