From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0B626C52D7F for ; Thu, 15 Aug 2024 19:20:47 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 79FF96B0206; Thu, 15 Aug 2024 15:20:46 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 750E96B0207; Thu, 15 Aug 2024 15:20:46 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 5F0A26B0208; Thu, 15 Aug 2024 15:20:46 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 3FF9F6B0206 for ; Thu, 15 Aug 2024 15:20:46 -0400 (EDT) Received: from smtpin26.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id D87F21C4EB9 for ; Thu, 15 Aug 2024 19:20:45 +0000 (UTC) X-FDA: 82455446850.26.3C0AFC6 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by imf17.hostedemail.com (Postfix) with ESMTP id CDB9440027 for ; Thu, 15 Aug 2024 19:20:43 +0000 (UTC) Authentication-Results: imf17.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=Uofw1n5C; spf=pass (imf17.hostedemail.com: domain of peterx@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=peterx@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1723749546; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=hN3cWnGeWKQo07EqYMNPRxi9dbHHJOKA4fs/GpZzU4w=; b=WhQCMRJOLcwBUg76kuJd0FG5/BP/0Vdfw3Cvt706I8sPZQHmESfsOzD5Npdxo8Fq9H8qko RWzlSioTYVnbrSCMWN43FCpjjFOBqFd+baDdBttnbwIKUiwGL6cdQ5cOu0ZII6MV3vIhyw Bb/k9ioRUh1mBjntbsRwECqls8vZvQw= ARC-Authentication-Results: i=1; imf17.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=Uofw1n5C; spf=pass (imf17.hostedemail.com: domain of peterx@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=peterx@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1723749546; a=rsa-sha256; cv=none; b=eYhlTSO6VOOGT7LTuPmVH3gBI5XjOB1VWygikMZqAPd5EMMIm1zPBIk5CFBFNIC4DNkXsq PGulXBji+sgT5Fj/WmMjuff71i4xer3x2llxbcB+PGLTotoNjfdOH54KzY+tLIPQ/irdmB jJh+eMg7P6iRtgt5H7faVCD/J6/ZRnk= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1723749643; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=hN3cWnGeWKQo07EqYMNPRxi9dbHHJOKA4fs/GpZzU4w=; b=Uofw1n5CtBB/SSK04LT8efwOJDDd/6Dc7j2MVp4zWCI6duxWOaOf99B3zWmuq5e4tCWnFO 84lKdHficVkWyPI+yglXcdqHKtG1dKpNoxBvTMVgx3FD5lmKxW85NgPv4GxisvVMZju9Yh 1J1RZv+HiVrfvUOnpeTxAm/lVwbexTg= Received: from mail-qt1-f197.google.com (mail-qt1-f197.google.com [209.85.160.197]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-357-fnl_hEwpP2mhcASEkrXELw-1; Thu, 15 Aug 2024 15:20:39 -0400 X-MC-Unique: fnl_hEwpP2mhcASEkrXELw-1 Received: by mail-qt1-f197.google.com with SMTP id d75a77b69052e-44fe325cd56so1830281cf.1 for ; Thu, 15 Aug 2024 12:20:39 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1723749639; x=1724354439; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=hN3cWnGeWKQo07EqYMNPRxi9dbHHJOKA4fs/GpZzU4w=; b=snY09DcJzCOyYasz0xEXqakDWHAu8oGH2pmvKAmQdc07nwfDt9waa58E551Kw0qPW/ qzLlAV8xqzmzFDSPsA5pLiDCLOA5yhHBcNFZHNCreTItN3u/lwn2C2jNLWOaRqLIPo1G cKJppH87Hvgp7cV47FiXsa1dv+WevrYPaWfvBa9OCfMTfCOMXxUY0/1HxQPmUEUUZd+x biedtFvphSxAlQ4iRloHCLs9Qb8bAKJOFG6yfxZ4jl096IxjIZySLkUS9TG3fx7+T4kI wI73ZZ+cNtVkh7EtgDYmKEPUYpib6iGso0wjSc+W7oG7cAUECQOnYm78zR8+af37yhNx oXag== X-Gm-Message-State: AOJu0YyRLp62KfS/vIONXUdDorHXRVpS8eQJGnesUz5y+Lmuy0mohm4N SkxfqABKtR0d4EQBMkcINvgdd3dCR0m6WFZXHQTOTFl66vyd35m95RicutJ7ZyHXaxP8uE3ZQxK 7LLN65hPFxhHMXkjGtQqBb7AMyM0pt2Cj9RgM90hNQ+2QY5at X-Received: by 2002:a05:620a:4153:b0:7a2:1c0:37b5 with SMTP id af79cd13be357-7a50693d38fmr45570985a.4.1723749639326; Thu, 15 Aug 2024 12:20:39 -0700 (PDT) X-Google-Smtp-Source: AGHT+IFojTZh/htiiNcOeQoBS1xVpbX6xb2lxm+nPbSup4gW+eZznV4PZFeir1x4wegN1uR+lXAZRg== X-Received: by 2002:a05:620a:4153:b0:7a2:1c0:37b5 with SMTP id af79cd13be357-7a50693d38fmr45569585a.4.1723749638927; Thu, 15 Aug 2024 12:20:38 -0700 (PDT) Received: from x1n (pool-99-254-121-117.cpe.net.cable.rogers.com. [99.254.121.117]) by smtp.gmail.com with ESMTPSA id af79cd13be357-7a4ff055ae8sm90637485a.51.2024.08.15.12.20.37 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 15 Aug 2024 12:20:38 -0700 (PDT) Date: Thu, 15 Aug 2024 15:20:35 -0400 From: Peter Xu To: Jason Gunthorpe Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Sean Christopherson , Oscar Salvador , Axel Rasmussen , linux-arm-kernel@lists.infradead.org, x86@kernel.org, Will Deacon , Gavin Shan , Paolo Bonzini , Zi Yan , Andrew Morton , Catalin Marinas , Ingo Molnar , Alistair Popple , Borislav Petkov , David Hildenbrand , Thomas Gleixner , kvm@vger.kernel.org, Dave Hansen , Alex Williamson , Yan Zhao Subject: Re: [PATCH 00/19] mm: Support huge pfnmaps Message-ID: References: <20240809160909.1023470-1-peterx@redhat.com> <20240814123715.GB2032816@nvidia.com> MIME-Version: 1.0 In-Reply-To: <20240814123715.GB2032816@nvidia.com> X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=utf-8 Content-Disposition: inline X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: CDB9440027 X-Stat-Signature: eujcuq6fwgcymimpht6yrw4ho1orbazc X-Rspam-User: X-HE-Tag: 1723749643-764276 X-HE-Meta: U2FsdGVkX19cdgusmcFOphSzvWRyLn6RrCl+EImmL4KvSCtoAWX+y9ZvxxdjvOUntwfLMTu8nQywCQ6HsOv3cs56bibTNgjRzuSkRN7IW9PKBQpfLEuvWGHbcyg2pjk4I2Cdr6bubOrHMPSqwmfEM22FNf5eRuDJjpA3Rq5x3v6cpJxJC+hTfOrNyx9kpglQanTw/JavkD9juiAxueFnceK6dM0Bw4fUwaRbBq9QxwtKj7P2nO5B6v1KclEgpL3bspntDI9esPHWFIhE/Hr7ou/8k7I6cVeggdBk1ape3wJsz6vZmtADForlmK9XxobVFwiBel+Q6ZdPBhAGpSHB6Hh56qBfIUo2yh07C9HstRpdpYWMV/SYlkxlmf2z/DIdU+lWTUL3G/JQ8J5m8eHMcO4Tg5eJcc3E9o4GZreulFEaAIEwhWmh6U6LS/o21zirbL7lq4T5HevZWoPHr3yF8qhQXHXcfIxNvn+GQEM9VjsnXQI7CIe9+hDisbB5puTTwXrK0Q/dj/sHPbLTFE/wWT3YBRGeQi47xbwVfOgc+1bFqmu/NddSfn6VkHKkwlcOuG6fR2xSvnxHLHQAuJGmAIXitn/CN+f5S+rReplJhbDJIsOjGh3I25VLvNg9FD1ceC3Mbg4CWkvnts/9Zcm+XlSGmu1TqRdKofU2IpBUUEXdRSv2b8FdEUSCL1K2+JqS2oo1gY+t7D1c7Jh4I38MYZAH0zQO+aWclTiyBbGwruIyn9s77zYaq/tFSxjga20zhThOK6bj3q5BeEdfJWx1UpcbZus9J5GqbTZs/aR/WZshP+5MzoEg+9OyXirNGrT+cjqDoObnqBqUerp54W61nv4LtFjNpHRhN18pg4+kfBNzRvutR6Ia29YL7r6Ul5Y2rw+I3+l5LWkblcS8lOro8NY9edNJSpZpRiUySqCJU+txabDII7Gt3c/RZ96e6LgfRXsTVhy/jLM6auBnq8P 9aWOe9Lo PfAOcAWIBsZ/77BxGFIMMBi2mhhLVzuFON4kt5eUa7dlqLMAHZE1mFIsma8XHzrFVkEI/NSsQc21COx1mxAJhocuAmNIl5BRZxxJ5IZogwgveeePjzMAPUBTw+l0pDLTpYBLUEA8Okkoll/QeLIGuYxFYfHs1H36lb81U+qLKsbJkvKX77Hhhcs5fO017UIySF59fOS/P1/cSl3Li+7ln+e+/Pc2hVdUZLQalYYzzzsgQBIEeviHR42L7+20N+WhbAXW7v+3p59xK/D1v7er3bQ6VMwLm3+9ifh8f5LI56dUwi+JpLQ/Jft0MxQH6cyK0LVzjYLMMcJKB4Q6HxlAizUYj+QcoXsnS/oHqKI2VKgqcjXHBc4ejWeumffCwSjWZTQoS+nTDIfkFbBCi+YjFcOIalUh/D+iih/rlVrV7UhRg+GfOX0Qi9GcpWw== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Wed, Aug 14, 2024 at 09:37:15AM -0300, Jason Gunthorpe wrote: > > Currently, only x86_64 (1G+2M) and arm64 (2M) are supported. > > There is definitely interest here in extending ARM to support the 1G > size too, what is missing? Currently PUD pfnmap relies on THP_PUD config option: config ARCH_SUPPORTS_PUD_PFNMAP def_bool y depends on ARCH_SUPPORTS_HUGE_PFNMAP && HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD Arm64 unfortunately doesn't yet support dax 1G, so not applicable yet. Ideally, pfnmap is too simple comparing to real THPs and it shouldn't require to depend on THP at all, but we'll need things like below to land first: https://lore.kernel.org/r/20240717220219.3743374-1-peterx@redhat.com I sent that first a while ago, but I didn't collect enough inputs, and I decided to unblock this series from that, so x86_64 shouldn't be affected, and arm64 will at least start to have 2M. > > > The other trick is how to allow gup-fast working for such huge mappings > > even if there's no direct sign of knowing whether it's a normal page or > > MMIO mapping. This series chose to keep the pte_special solution, so that > > it reuses similar idea on setting a special bit to pfnmap PMDs/PUDs so that > > gup-fast will be able to identify them and fail properly. > > Make sense > > > More architectures / More page sizes > > ------------------------------------ > > > > Currently only x86_64 (2M+1G) and arm64 (2M) are supported. > > > > For example, if arm64 can start to support THP_PUD one day, the huge pfnmap > > on 1G will be automatically enabled. > > Oh that sounds like a bigger step.. Just to mention, no real THP 1G needed here for pfnmaps. The real gap here is only about the pud helpers that only exists so far with CONFIG_THP_PUD in huge_memory.c. > > > VFIO is so far the only consumer for the huge pfnmaps after this series > > applied. Besides above remap_pfn_range() generic optimization, device > > driver can also try to optimize its mmap() on a better VA alignment for > > either PMD/PUD sizes. This may, iiuc, normally require userspace changes, > > as the driver doesn't normally decide the VA to map a bar. But I don't > > think I know all the drivers to know the full picture. > > How does alignment work? In most caes I'm aware of the userspace does > not use MAP_FIXED so the expectation would be for the kernel to > automatically select a high alignment. I suppose your cases are > working because qemu uses MAP_FIXED and naturally aligns the BAR > addresses? > > > - x86_64 + AMD GPU > > - Needs Alex's modified QEMU to guarantee proper VA alignment to make > > sure all pages to be mapped with PUDs > > Oh :( So I suppose this answers above. :) Yes, alignment needed. -- Peter Xu