Re: [PATCH v2 00/19] mm: Support huge pfnmaps

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Jiaqi Yan <jiaqiyan@google.com>
To: Peter Xu <peterx@redhat.com>
Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	 Gavin Shan <gshan@redhat.com>,
	Catalin Marinas <catalin.marinas@arm.com>,
	x86@kernel.org,  Ingo Molnar <mingo@redhat.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	 Paolo Bonzini <pbonzini@redhat.com>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	 Thomas Gleixner <tglx@linutronix.de>,
	Alistair Popple <apopple@nvidia.com>,
	kvm@vger.kernel.org,  linux-arm-kernel@lists.infradead.org,
	Sean Christopherson <seanjc@google.com>,
	 Oscar Salvador <osalvador@suse.de>,
	Jason Gunthorpe <jgg@nvidia.com>, Borislav Petkov <bp@alien8.de>,
	 Zi Yan <ziy@nvidia.com>,
	Axel Rasmussen <axelrasmussen@google.com>,
	 David Hildenbrand <david@redhat.com>,
	Yan Zhao <yan.y.zhao@intel.com>, Will Deacon <will@kernel.org>,
	 Kefeng Wang <wangkefeng.wang@huawei.com>,
	Alex Williamson <alex.williamson@redhat.com>
Subject: Re: [PATCH v2 00/19] mm: Support huge pfnmaps
Date: Wed, 28 Aug 2024 09:23:17 -0700	[thread overview]
Message-ID: <CACw3F53kCBGzMcOzcum3waUtYNgpcMTxaEzMjBS_-W-gsYG05A@mail.gmail.com> (raw)
In-Reply-To: <Zs83JJhFY9S-Gxqc@x1n>

On Wed, Aug 28, 2024 at 7:41 AM Peter Xu <peterx@redhat.com> wrote:
>
> On Tue, Aug 27, 2024 at 05:42:21PM -0700, Jiaqi Yan wrote:
> > On Tue, Aug 27, 2024 at 3:57 PM Peter Xu <peterx@redhat.com> wrote:
> > >
> > > On Tue, Aug 27, 2024 at 03:36:07PM -0700, Jiaqi Yan wrote:
> > > > Hi Peter,
> > >
> > > Hi, Jiaqi,
> > >
> > > > I am curious if there is any work needed for unmap_mapping_range? If a
> > > > driver hugely remap_pfn_range()ed at 1G granularity, can the driver
> > > > unmap at PAGE_SIZE granularity? For example, when handling a PFN is
> > >
> > > Yes it can, but it'll invoke the split_huge_pud() which default routes to
> > > removal of the whole pud right now (currently only covers either DAX
> > > mappings or huge pfnmaps; it won't for anonymous if it comes, for example).
> > >
> > > In that case it'll rely on the driver providing proper fault() /
> > > huge_fault() to refault things back with smaller sizes later when accessed
> > > again.
> >
> > I see, so the driver needs to drive the recovery process, and code
> > needs to be in the driver.
> >
> > But it seems to me the recovery process will be more or less the same
> > to different drivers? In that case does it make sense that
> > memory_failure do the common things for all drivers?
> >
> > Instead of removing the whole pud, can driver or memory_failure do
> > something similar to non-struct-page-version of split_huge_page? So
> > driver doesn't need to re-fault good pages back?
>
> I think we can, it's just that we don't yet have a valid use case.
>
> DAX is definitely fault-able.
>
> While for the new huge pfnmap, currently vfio is the only user, and vfio
> only requires to either zap all or map all.  In that case there's no real
> need to ask for what you described yet.  Meanwhile it's also faultable, so
> if / when needed it should hopefully still do the work properly.
>
> I believe it's not usual requirement too for most of the rest drivers, as
> most of them don't even support fault() afaiu. remap_pfn_range() can start
> to use huge mappings, however I'd expect they're mostly not ready for
> random tearing down of any MMIO mappings.
>
> It sounds doable to me though when there's a need of what you're
> describing, but I don't think I know well on the use case yet.
>
> >
> >
> > >
> > > > poisoned in the 1G mapping, it would be great if the mapping can be
> > > > splitted to 2M mappings + 4k mappings, so only the single poisoned PFN
> > > > is lost. (Pretty much like the past proposal* to use HGM** to improve
> > > > hugetlb's memory failure handling).
> > >
> > > Note that we're only talking about MMIO mappings here, in which case the
> > > PFN doesn't even have a struct page, so the whole poison idea shouldn't
> > > apply, afaiu.
> >
> > Yes, there won't be any struct page. Ankit proposed this patchset* for
> > handling poisoning. I wonder if someday the vfio-nvgrace-gpu-pci
> > driver adopts your change via new remap_pfn_range (install PMD/PUD
> > instead of PTE), and memory_failure_pfn still
> > unmap_mapping_range(pfn_space->mapping, pfn << PAGE_SHIFT, PAGE_SIZE,
> > 0), can it somehow just work and no re-fault needed?
> >
> > * https://lore.kernel.org/lkml/20231123003513.24292-2-ankita@nvidia.com/#t
>
> I see now, interesting.. Thanks for the link.
>
> In that case of nvgpu usage, one way is to do as what you said; we can
> enhance the pmd/pud split for pfnmap, but maybe that's an overkill.

Yeah, just want a poke to see if splitting pmd/pud is some low-hanging fruit.

>
> I saw that the nvgpu will need a fault() anyway so as to detect poisoned
> PFNs, then it's also feasible that in the new nvgrace_gpu_vfio_pci_fault()
> when it supports huge pfnmaps it'll need to try to detect whether the whole
> faulting range contains any poisoned PFNs, then provide FALLBACK if so
> (rather than VM_FAULT_HWPOISON).
>
> E.g., when 4K of 2M is poisoned, we'll erase the 2M completely.  When
> access happens, as long as the accessed 4K is not on top of the poisoned
> 4k, huge_fault() should still detect that there's 4k range poisoned, then
> it'll not inject pmd but return FALLBACK, then in the fault() it'll see
> the accessed 4k range is not poisoned, then install a pte.

Thanks for illustrating the re-fault flow again. I think this should
work well for drivers (having large MMIO size) that care about memory
errors. We can put the pmd/pud split idea to backlog and see if it is
needed in future.

>
> Thanks,
>
> --
> Peter Xu
>

next prev parent reply	other threads:[~2024-08-28 16:23 UTC|newest]

Thread overview: 69+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-08-26 20:43 Peter Xu
2024-08-26 20:43 ` [PATCH v2 01/19] mm: Introduce ARCH_SUPPORTS_HUGE_PFNMAP and special bits to pmd/pud Peter Xu
2024-08-26 20:43 ` [PATCH v2 02/19] mm: Drop is_huge_zero_pud() Peter Xu
2024-08-26 20:43 ` [PATCH v2 03/19] mm: Mark special bits for huge pfn mappings when inject Peter Xu
2024-08-28 15:31   ` David Hildenbrand
2024-08-26 20:43 ` [PATCH v2 04/19] mm: Allow THP orders for PFNMAPs Peter Xu
2024-08-28 15:31   ` David Hildenbrand
2024-08-26 20:43 ` [PATCH v2 05/19] mm/gup: Detect huge pfnmap entries in gup-fast Peter Xu
2024-08-26 20:43 ` [PATCH v2 06/19] mm/pagewalk: Check pfnmap for folio_walk_start() Peter Xu
2024-08-28  7:44   ` David Hildenbrand
2024-08-28 14:24     ` Peter Xu
2024-08-28 15:30       ` David Hildenbrand
2024-08-28 19:45         ` Peter Xu
2024-08-28 23:46           ` Jason Gunthorpe
2024-08-29  6:35             ` David Hildenbrand
2024-08-29 18:45               ` Peter Xu
2024-08-29 15:10           ` David Hildenbrand
2024-08-29 18:49             ` Peter Xu
2024-08-26 20:43 ` [PATCH v2 07/19] mm/fork: Accept huge pfnmap entries Peter Xu
2024-08-29 15:10   ` David Hildenbrand
2024-08-29 18:26     ` Peter Xu
2024-08-29 19:44       ` David Hildenbrand
2024-08-29 20:01         ` Peter Xu
2024-09-02  7:58   ` Yan Zhao
2024-09-03 21:23     ` Peter Xu
2024-09-09 22:25       ` Andrew Morton
2024-09-09 22:43         ` Peter Xu
2024-09-09 23:15           ` Andrew Morton
2024-09-10  0:08             ` Peter Xu
2024-09-10  2:52               ` Yan Zhao
2024-09-10 12:16                 ` Peter Xu
2024-09-11  2:16                   ` Yan Zhao
2024-09-11 14:34                     ` Peter Xu
2024-08-26 20:43 ` [PATCH v2 08/19] mm: Always define pxx_pgprot() Peter Xu
2024-08-26 20:43 ` [PATCH v2 09/19] mm: New follow_pfnmap API Peter Xu
2024-08-26 20:43 ` [PATCH v2 10/19] KVM: Use " Peter Xu
2024-08-26 20:43 ` [PATCH v2 11/19] s390/pci_mmio: " Peter Xu
2024-08-26 20:43 ` [PATCH v2 12/19] mm/x86/pat: Use the new " Peter Xu
2024-08-26 20:43 ` [PATCH v2 13/19] vfio: " Peter Xu
2024-08-26 20:43 ` [PATCH v2 14/19] acrn: " Peter Xu
2024-08-26 20:43 ` [PATCH v2 15/19] mm/access_process_vm: " Peter Xu
2024-08-26 20:43 ` [PATCH v2 16/19] mm: Remove follow_pte() Peter Xu
2024-09-01  4:33   ` Yu Zhao
2024-09-01 13:39     ` David Hildenbrand
2024-08-26 20:43 ` [PATCH v2 17/19] mm/x86: Support large pfn mappings Peter Xu
2024-08-26 20:43 ` [PATCH v2 18/19] mm/arm64: " Peter Xu
2025-03-19 22:22   ` Keith Busch
2025-03-19 22:46     ` Peter Xu
2025-03-19 22:53       ` Keith Busch
2024-08-26 20:43 ` [PATCH v2 19/19] vfio/pci: Implement huge_fault support Peter Xu
2024-08-27 22:36 ` [PATCH v2 00/19] mm: Support huge pfnmaps Jiaqi Yan
2024-08-27 22:57   ` Peter Xu
2024-08-28  0:42     ` Jiaqi Yan
2024-08-28  0:46       ` Jiaqi Yan
2024-08-28 14:24       ` Jason Gunthorpe
2024-08-28 16:10         ` Jiaqi Yan
2024-08-28 23:49           ` Jason Gunthorpe
2024-08-29 19:21             ` Jiaqi Yan
2024-09-04 15:52               ` Jason Gunthorpe
2024-09-04 16:38                 ` Jiaqi Yan
2024-09-04 16:43                   ` Jason Gunthorpe
2024-09-04 16:58                     ` Jiaqi Yan
2024-09-04 17:00                       ` Jason Gunthorpe
2024-09-04 17:07                         ` Jiaqi Yan
2024-09-09  3:56                           ` Ankit Agrawal
2024-08-28 14:41       ` Peter Xu
2024-08-28 16:23         ` Jiaqi Yan [this message]
2024-09-09  4:03 ` Ankit Agrawal
2024-09-09 15:03   ` Peter Xu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CACw3F53kCBGzMcOzcum3waUtYNgpcMTxaEzMjBS_-W-gsYG05A@mail.gmail.com \
    --to=jiaqiyan@google.com \
    --cc=akpm@linux-foundation.org \
    --cc=alex.williamson@redhat.com \
    --cc=apopple@nvidia.com \
    --cc=axelrasmussen@google.com \
    --cc=bp@alien8.de \
    --cc=catalin.marinas@arm.com \
    --cc=dave.hansen@linux.intel.com \
    --cc=david@redhat.com \
    --cc=gshan@redhat.com \
    --cc=jgg@nvidia.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mingo@redhat.com \
    --cc=osalvador@suse.de \
    --cc=pbonzini@redhat.com \
    --cc=peterx@redhat.com \
    --cc=seanjc@google.com \
    --cc=tglx@linutronix.de \
    --cc=wangkefeng.wang@huawei.com \
    --cc=will@kernel.org \
    --cc=x86@kernel.org \
    --cc=yan.y.zhao@intel.com \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox