From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id BDD15C3DA42 for ; Wed, 17 Jul 2024 14:15:34 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 4F0996B0096; Wed, 17 Jul 2024 10:15:34 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 479BB6B0098; Wed, 17 Jul 2024 10:15:34 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 2CCB96B0099; Wed, 17 Jul 2024 10:15:34 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 036596B0096 for ; Wed, 17 Jul 2024 10:15:33 -0400 (EDT) Received: from smtpin13.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id AA364A0A92 for ; Wed, 17 Jul 2024 14:15:33 +0000 (UTC) X-FDA: 82349442546.13.4C6B6AA Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by imf11.hostedemail.com (Postfix) with ESMTP id 237414002D for ; Wed, 17 Jul 2024 14:15:30 +0000 (UTC) Authentication-Results: imf11.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=ghhwTmoh; spf=pass (imf11.hostedemail.com: domain of peterx@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=peterx@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1721225691; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=cXQeX8Qk8lUZnYDevLFdRtA6b9n0o4qNfnEDKf/jSVY=; b=j+dGww9z9Se5Yf7i7VKlUPd0+nZ6mRLm7mcUl0cvj3J29PcQkwHQ44d6r9fsWV+dwOIW1u MjW048OJvBl7p1WHQK53HjqTSbYyY0ta3f+gJFGsx5S2ThbLFymgYbZp7+pvNonuiOZMUq xSgiDaQwP+NtbMNow0rk8/UoTJCLdgg= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1721225691; a=rsa-sha256; cv=none; b=Ga32XPt/Bxwh1lRhfHJ0TaeO8pxlHxx/uBM9/iPV52/qyGe9z+ElFzwLIMMdvkOUqJQaYl JoQPwDnMN9j8mOYPu4BnzGDfyAuSToPLbAQKEIeuaIjQN46AR5uoyIIo9D7GLEgfxtaBS/ zjjsKqluGTXR9Eoa6rnyTI1eUfoYESw= ARC-Authentication-Results: i=1; imf11.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=ghhwTmoh; spf=pass (imf11.hostedemail.com: domain of peterx@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=peterx@redhat.com; dmarc=pass (policy=none) header.from=redhat.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1721225730; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=cXQeX8Qk8lUZnYDevLFdRtA6b9n0o4qNfnEDKf/jSVY=; b=ghhwTmohSESUBvpAzoj9+tJDUscIfSxRnbrnwYmRHw8WIZiDTdTsrXM+bt2eQsYopwWMQz Q7FsoSx6sZILeQwT1tQfTxmZHiNkOw45p77OoL+jg3DHbVyH52v7QekwmlpbYnNudpfqiB 7HjkxvhLVIGZmVZIlS5s/ApNDKA9qrc= Received: from mail-qk1-f199.google.com (mail-qk1-f199.google.com [209.85.222.199]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-189-bTVyD9WFO0WyAkFDunRazA-1; Wed, 17 Jul 2024 10:15:28 -0400 X-MC-Unique: bTVyD9WFO0WyAkFDunRazA-1 Received: by mail-qk1-f199.google.com with SMTP id af79cd13be357-7a13cdf3d41so23054885a.0 for ; Wed, 17 Jul 2024 07:15:28 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1721225728; x=1721830528; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=cXQeX8Qk8lUZnYDevLFdRtA6b9n0o4qNfnEDKf/jSVY=; b=p8sYPan4YZ4N1Jk/E5E0jOWSWvaDDZW5mKId/B/f0mP94q6PK8mYo6WzIF5a5yaiAw DfJ4Pk+BFh/1iIg58vNKcYkpq84+jVLDKKv6nMzGvEaihnzOJ/I0uLgbNRiKp8d6Nldk YUOpO3z2vajEifBflY4pNxaf8sHhkszOzP6x4ZhSuif1T//hlzowGvNkLalBWKmxFr4S oKXr5BxUT77z1gA6fY/tDv9AEhR2WP0w4q/31BbmC4OYmoPVAqQbYK1AIqeGqnwjmnmc Fqd2bnW6+P2di4yLKX2kvgoDa9QdlSEho+/iPODXm1ix67+jwGIxr+XSkRFN3RI691tZ Casw== X-Forwarded-Encrypted: i=1; AJvYcCX1j9SQhi8qkMyIpKJo2b1WvhmATequhpXOtzTJ6GxULCf8kGKfLq6K6cpjcYm8DixYij+SEujpIF1xuzu7XqUIij0= X-Gm-Message-State: AOJu0YyjAzClsl8vzTZLzmFT0l/Bg3YolLkc76Cewpoi2q7IASSD0wID efZoeLzugiwmxC5eh1+10fmbDoVoZKr6QJf8a2axXySFYgkCqvOtOb9GnkJr8WOG4fl9AOcxOgy g7RaD7AuvhoPg29rpEbTT0KbJsxW8qwNANLG1hAlEIJdL7izn X-Received: by 2002:a05:620a:4151:b0:79e:fc87:b4fe with SMTP id af79cd13be357-7a187421448mr117806085a.1.1721225728021; Wed, 17 Jul 2024 07:15:28 -0700 (PDT) X-Google-Smtp-Source: AGHT+IG0HAnXSRTkAEbuW2uSdlrcmqCuOQ25HGgYGVOppwIg94eSab5lmYdAjiorePj+Oy6WT7B5AA== X-Received: by 2002:a05:620a:4151:b0:79e:fc87:b4fe with SMTP id af79cd13be357-7a187421448mr117803685a.1.1721225727455; Wed, 17 Jul 2024 07:15:27 -0700 (PDT) Received: from x1n (pool-99-254-121-117.cpe.net.cable.rogers.com. [99.254.121.117]) by smtp.gmail.com with ESMTPSA id af79cd13be357-7a160be34a2sm400369885a.59.2024.07.17.07.15.25 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 17 Jul 2024 07:15:26 -0700 (PDT) Date: Wed, 17 Jul 2024 10:15:24 -0400 From: Peter Xu To: Yan Zhao Cc: "linux-kernel@vger.kernel.org" , "linux-mm@kvack.org" , Andrew Morton , Alex Williamson , Jason Gunthorpe , Al Viro , Dave Hansen , Andy Lutomirski , Peter Zijlstra , Thomas Gleixner , Ingo Molnar , Borislav Petkov , "Kirill A . Shutemov" , "x86@kernel.org" , "Tian, Kevin" , Pei Li , David Hildenbrand , David Wang <00107082@163.com>, Bert Karwatzki , Sergey Senozhatsky Subject: Re: [PATCH] mm/x86/pat: Only untrack the pfn range if unmap region Message-ID: References: <20240712144244.3090089-1-peterx@redhat.com> MIME-Version: 1.0 In-Reply-To: X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=utf-8 Content-Disposition: inline X-Rspam-User: X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: 237414002D X-Stat-Signature: jtxj5ka81zgczkfbgkmhhgs9c6iuyxaj X-HE-Tag: 1721225730-206286 X-HE-Meta: U2FsdGVkX1+GU10pJ3DhFK0IkHr2iDwyQK17RszglwYqOF9VYoY/GnhOMl9dKd68iUB95P7gkjcYpHgmH+6lxIYGAEy1TVYfuZY+QLwqSklHun2l1YkPt17kd8Emzn+GvLwmTp6MuIynTRmdhKQq5oUVJ3kLBZHQUg98jGX1b7otf07qgsvjp82y3SAGxxo+VE51V4aJSHCOTTufF0zunUmwCVaHwOJApHegGdk3hDUuW9jsSNYG+sc0pd6aH0xkvHc1V4p7tQqaGRlAackBHtXW4WQcrCP1U8Ar9U/JKXRWXFC9gfXkIaC/X5LyJwpMloC4HhtmrjAttVJ8revtWAi5WROxnIZhM9cMnXqQg7CNrE3w+NxzFy3Gkj2RJREQActe1UxkVNSwjvx1iHPLgUVLLxVlZX4KuYL4KDkQ6K5KCe2oYz+kTFOw+rh4oQKAMDCBDel8fEpyra2sU3RvRP8dwb5LwdrrqVSKSpgUajYJ2FwFC1RdI3RFbKSppBRZ+Cg/hlWXl6Nx3Ps7wnDF7zian8p4bLMTt8wxKPvVgW+PoYvvVs9YKlD4rnQv26Noe1jPrH0e9R8lp4SSRho6TGQYO1PepHncbo+BpVUKBfXMkUbK42a4F+t7jTmRafLVGaJo5wSZpeHm6LyBg53+jZdELYi4II63RfJzSvVEtkPEIFTMQsp5imfjlFykoo+qLYZBTtbyZzr+npRjcPS/f2zVao7nUX9PZ57Ew8xIc/QhY+JW4QMieHCJtk6/j6BlFKgCYQ21XbGJZURAthIpXFl6h7hMJEqLTF+ybv409X9O4RUgJJdwUWKS+MxpcDVZVTy7jMASAksq5rNJWS+uDqtIKGfeQjmtwYUvNtQhH8DoTrZiFZWsAJgpYDi/0RfwRrF5EXPmLKCj9+ZLLHwApVJ4eP4NdZNJat84SZLD3KV9dPTg7gz1E0chLojZvc76H2flWQnIdIgmr1s9IpJ I/qzjAaP o8C+LR4RcjjudJqJy8S0tIS9jTT+q6SWlWgSnucTjY2nNPt6B7k8kq011h6t1Q+U2V2vRzhf9cDgtkoT7jVsRiJDU2qDIUyBgqd0tRUgkCLiUyW4kiyjYRxnmDuaNvFdfxIOaCESqz/k32gUvTfi9+0dOVujaxtdh85BWs8Irr4eoxdtz1Fk+fPEvwKHUJRgsBBw+QekSDq/wbIB19tANFwZTF4NPB6nciayMiM4+sFBnJp1X1+KjLxVAuca921bvarJ2QbI2DDjSsqjuJCuhb7syJYq1IsBpa3SHVKgVGGQCSdJJHwevJdg2vF3AWO6aChYUDA2cUYq90xZGw+Fxg1lb/Y7RGFsB7FyB176VizdtuxTOmcL9X+QKlz3CkBQYhZEuHbMgnlqkltjJIUuQr/gxt7fAvikEP+GTirhfcr/wqhB4ccXar89JJ1C3MbfsnXBuuKJ/vNmEyfXUcT1O1hyPJxHX5qfkEY5xqWFlKbDdZXQe11ciliKjespDandqsblp4pl9jKl1pbrXSVsx/LE0njoo9ptIj1DucDTAQD+TFuzTNxM/Dy+PoypjIDUKUMr7/dq/XYKMPgsbPie26ANV1SZi99/GP0qHnqZ5L/ePlHR+USGVVp3eh4HxzSvxoF+W2pPhvcvKEYxj/yKbg9RvEXT/URqncQGLlwZZKUA3HEESogRFt99/YP9XjOn1PQgQqSUyW0TU+7VVfb6+roSoyS5pUWmxQeM2JJdceS8tkzW3FhWbjs7GLpBtiFyJzxdbjEFp16rYpaPlnuPArR8ju+mcraxkt8QPZB/LmIFYKb5U+GD9MAJIMhJLg4t7oONcM29AMboUbzlA80dxMz+Yyt4z96IgvNZJ5PR+2akS6URkBbBeH+ljO6BulMiVuC3mw0ki7IJKVGIqMR3rZcz6lPvALcLi+fFCLRuzIqe+8q2nVtxrU98tFkXPCkiumpcB5f2gxm6DFqiwBlmjhvLAXem7 9IKA+YPk 5DS5NrS7OYwOAV/r0h0aPg== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Wed, Jul 17, 2024 at 09:38:34AM +0800, Yan Zhao wrote: > On Tue, Jul 16, 2024 at 03:01:50PM -0400, Peter Xu wrote: > > On Tue, Jul 16, 2024 at 05:13:29PM +0800, Yan Zhao wrote: > > > On Mon, Jul 15, 2024 at 10:29:59PM +0800, Peter Xu wrote: > > > > On Mon, Jul 15, 2024 at 03:08:58PM +0800, Yan Zhao wrote: > > > > > On Fri, Jul 12, 2024 at 10:42:44AM -0400, Peter Xu wrote: > > > > > > This patch is one patch of an old series [1] that got reposted standalone > > > > > > here, with the hope to fix some reported untrack_pfn() issues reported > > > > > > recently [2,3], where there used to be other fix [4] but unfortunately > > > > > > which looks like to cause other issues. The hope is this patch can fix it > > > > > > the right way. > > > > > > > > > > > > X86 uses pfn tracking to do pfnmaps. AFAICT, the tracking should normally > > > > > > start at mmap() of device drivers, then untracked when munmap(). However > > > > > > in the current code the untrack is done in unmap_single_vma(). This might > > > > > > be problematic. > > > > > > > > > > > > For example, unmap_single_vma() can be used nowadays even for zapping a > > > > > > single page rather than the whole vmas. It's very confusing to do whole > > > > > > vma untracking in this function even if a caller would like to zap one > > > > > > page. It could simply be wrong. > > > > > > > > > > > > Such issue won't be exposed by things like MADV_DONTNEED won't ever work > > > > > > for pfnmaps and it'll fail the madvise() already before reaching here. > > > > > > However looks like it can be triggered like what was reported where invoked > > > > > > from an unmap request from a file vma. > > > > > > > > > > > > There's also work [5] on VFIO (merged now) to allow tearing down MMIO > > > > > > pgtables before an munmap(), in which case we may not want to untrack the > > > > > > pfns if we're only tearing down the pgtables. IOW, we may want to keep the > > > > > > pfn tracking information as those pfn mappings can be restored later with > > > > > > the same vma object. Currently it's not an immediate problem for VFIO, as > > > > > > VFIO uses UC- by default, but it looks like there's plan to extend that in > > > > > > the near future. > > > > > > > > > > > > IIUC, this was overlooked when zap_page_range_single() was introduced, > > > > > > while in the past it was only used in the munmap() path which wants to > > > > > > always unmap the region completely. E.g., commit f5cc4eef9987 ("VM: make > > > > > > zap_page_range() callers that act on a single VMA use separate helper") is > > > > > > the initial commit that introduced unmap_single_vma(), in which the chunk > > > > > > of untrack_pfn() was moved over from unmap_vmas(). > > > > > > > > > > > > Recover that behavior to untrack pfnmap only when unmap regions. > > > > > > > > > > > > [1] https://lore.kernel.org/r/20240523223745.395337-1-peterx@redhat.com > > > > > > [2] https://groups.google.com/g/syzkaller-bugs/c/FeQZvSbqWbQ/m/tHFmoZthAAAJ > > > > > > [3] https://lore.kernel.org/r/20240712131931.20207-1-00107082@163.com > > > > > > [4] https://lore.kernel.org/all/20240710-bug12-v1-1-0e5440f9b8d3@gmail.com/ > > > > > > [5] https://lore.kernel.org/r/20240523195629.218043-1-alex.williamson@redhat.com > > > > > > > > > > > > Cc: Alex Williamson > > > > > > Cc: Jason Gunthorpe > > > > > > Cc: Al Viro > > > > > > Cc: Dave Hansen > > > > > > Cc: Andy Lutomirski > > > > > > Cc: Peter Zijlstra > > > > > > Cc: Thomas Gleixner > > > > > > Cc: Ingo Molnar > > > > > > Cc: Borislav Petkov > > > > > > Cc: Kirill A. Shutemov > > > > > > Cc: x86@kernel.org > > > > > > Cc: Yan Zhao > > > > > > Cc: Kevin Tian > > > > > > Cc: Pei Li > > > > > > Cc: David Hildenbrand > > > > > > Cc: David Wang <00107082@163.com> > > > > > > Cc: Bert Karwatzki > > > > > > Cc: Sergey Senozhatsky > > > > > > Signed-off-by: Peter Xu > > > > > > --- > > > > > > > > > > > > NOTE: I massaged the commit message comparing to the rfc post [1], the > > > > > > patch itself is untouched. Also removed rfc tag, and added more people > > > > > > into the loop. Please kindly help test this patch if you have a reproducer, > > > > > > as I can't reproduce it myself even with the syzbot reproducer on top of > > > > > > mm-unstable. Instead of further check on the reproducer, I decided to send > > > > > > this out first as we have a bunch of reproducers on the list now.. > > > > > > --- > > > > > > mm/memory.c | 5 ++--- > > > > > > 1 file changed, 2 insertions(+), 3 deletions(-) > > > > > > > > > > > > diff --git a/mm/memory.c b/mm/memory.c > > > > > > index 4bcd79619574..f57cc304b318 100644 > > > > > > --- a/mm/memory.c > > > > > > +++ b/mm/memory.c > > > > > > @@ -1827,9 +1827,6 @@ static void unmap_single_vma(struct mmu_gather *tlb, > > > > > > if (vma->vm_file) > > > > > > uprobe_munmap(vma, start, end); > > > > > > > > > > > > - if (unlikely(vma->vm_flags & VM_PFNMAP)) > > > > > > - untrack_pfn(vma, 0, 0, mm_wr_locked); > > > > > > - > > > > > Specifically to VFIO's case, looks it doesn't matter if untrack_pfn() is > > > > > called here, since remap_pfn_range() is not called in mmap() and fault > > > > > handler, and therefore (vma->vm_flags & VM_PAT) is always 0. > > > > > > > > Right when with current repo, but I'm thinking maybe we should have VM_PAT > > > > there.. > > > Yes, I agree. > > > > > > But, currently for VFIO, it cannot call io_remap_pfn_range() in the fault > > > handler since vm_flags_set() requires mmap lock held for write while > > > the fault handler can only hold mmap lock for read. > > > So, it relies on ioremap()/iounmap() to reserve/de-reserve memtypes, > > > without VM_PAT being set in vma. > > > > Right, neither vm_flags_set() nor io_remap_pfn_range() should be called in > > a fault handler. They should only be called in mmap() phase. > > > > When you mentioned ioremap(), it's only for the VGA device, right? I don't > > see it's used in the vfio-pci's fault handler. > Oh, it's pci_iomap() in the vfio-pci's fault handler, another version of > ioremap() :) Right. If to be explicit, I think it's in mmap(), and looks like Alex has a comment for that: /* * Even though we don't make use of the barmap for the mmap, * we need to request the region and the barmap tracks that. */ Maybe in 2012 that's a must? PS: when looking, that looks like a proper place to reuse vfio_pci_core_setup_barmap() also in the mmap() function. > > > > > > > > > > > > See what reserve_pfn_range() does right now: it'll make sure only one owner > > > > reserve this area, e.g. memtype_reserve() will fail if it has already been > > > > reserved. Then when succeeded as the first one to reserve the region, > > > > it'll make sure this mem type to also be synchronized to the kernel map > > > > (memtype_kernel_map_sync). > > > > > > > > So I feel like when MMIO disabled for a VFIO card, we may need to call > > > > reserve_pfn_range(), telling the kernel the mem type VFIO uses, even if the > > > > pgtable will be empty, and even if currently it's always UC- for now: > > > > > > > > vfio_pci_core_mmap(): > > > > vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot); > > > > > > > > Then the memtype will be reserved even if it cannot be used. Otherwise I > > > > don't know whether there's side effects of kernel identity mapping where it > > > > mismatch later with what's mapped in the userspace via the vma, when MMIO > > > > got enabled again: pgtable started to be injected with a different memtype > > > > that specified only in the vma's pgprot. > > > Current VFIO relies on pci_iomap() to reserve MMIO pfns as UC-, thus > > > no VM_PAT in vmas. > > > > > > Calling remap_pfn_range() in the mmap() will cause problem to support > > > dynamic faulting. Looks there's still a window even with an immediate > > > unmap after mmap(). > > > > It can be conditionally injected. See Alex's commit (not yet posted > > anywhere, only used in our internal testing so far): > > > > https://github.com/xzpeter/linux/commit/f432e0e46c34e493943034be4cb9d6eb638f57d1 > > > I'm a bit confused by the code. > It looks that it will do the remap_pfn_range() in mmap() if hardware is ready, > and will just set vma flags if hardware is not ready. > > But if the hardware is not ready in mmap(), which code in the fault handler > will reserve pfn memtypes? I thought I answered that below.. :) I think we should use track_pfn_remap(). > > > > > > > Also, calling remap_pfn_range() in mmap() may not meet the requirement of > > > drivers to dynamic switch backend resources, e.g. as what's in > > > cxl_mmap_fault(), since one remap_pfn_range() cannot reserve memtypes for > > > all backend resources at once. > > > > > > So, is there any way for the driver to reserve memtypes and set VM_PAT in > > > fault handler? > > > > I must confess I'm not familiar with memtype stuff, and just started > > looking recently. > > > > Per my reading so far, we have these multiple ways of doing memtype > > reservations, and no matter which API we use (ioremap / track_pfn_remap / > > pci_iomap), the same memtype needs to be used otherwise the reservation > > will fail. Here ioremap / pci_iomap will involve one more vmap so that the > > virtual mapping will present already after the API returns. > Right, I understand in the same way :) > > > > > Then here IMHO track_pfn_remap() is the one that we should still use in > > page-level mmap() controls, because so far we don't want to map any MMIO > > range yet, however we want to say "hey this VMA maps something special", by > > reserving these memtype and set VM_PAT. We want the virtual mapping only > > later during a fault(). > > > > In short, it looks to me the right thing we should do is to manually invoke > > track_pfn_remap() in vfio-pci's mmap() hook. > > > > if (!vdev->pm_runtime_engaged && __vfio_pci_memory_enabled(vdev)) > > ret = remap_pfn_range(vma, vma->vm_start, pfn, > > req_len, vma->vm_page_prot); > > else > > // TODO: manually invoke track_pfn_remap() here > > vm_flags_set(vma, VM_IO | VM_PFNMAP | > > VM_DONTEXPAND | VM_DONTDUMP); > What if we have to remap another set of pfns in the "else" case? Use vmf_insert_pfn*(): these helpers do not reserve memtype but looks them up only. > > > > > Then the vma is registered, and untrack_pfn() should be automatically done > > when vma unmaps (and that relies on this patch to not do that too early). > Well, I'm concerned by one use case. > > 1. The driver calls remap_pfn_range() to reserve memtype for pfn1 + add > VM_PAT flag. > 2. Then unmap_single_vma() is called. With this patch, the pfn1 memtype is > still reserved. > 3. The fault handler calls vmf_insert_pfn() for another pfn2. > 4. unmap_vmas() is called. However, untrack_pfn() will only find pfn2 > instead of prevous pfn1. It depends on what's your exact question, if it's about pgtable: I don't think munmap() requires PFNMAP pgtables to always exist, and it'll simply skip none entries. And if it's about PAT tracking: that is exactly what I'm talking about below.. where I think untracking shouldn't rely on pgtables. I'll copy you too if I'll prepare some patches for the latter. With that patchset (plus this patch) it should fix David Wang's issue and similar, AFAICT. Thanks, > > > > From that POV, I still think this patch does the right thing and should be > > merged, even if we probably have one more issue to fix as David Wang > > reported.. > > > > I'll need to think about how to do that right, as I think that'll be needed > > as long as pfnmaps will support fault()s: it means when munmap() the > > pgtable may not present, and PAT cannot rely on walking the pgtable to know > > the base PFN anymore. > > > > Thanks, > > > > -- > > Peter Xu > > > -- Peter Xu