linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Peter Xu <peterx@redhat.com>
To: Yan Zhao <yan.y.zhao@intel.com>
Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	Gavin Shan <gshan@redhat.com>,
	Catalin Marinas <catalin.marinas@arm.com>,
	x86@kernel.org, Ingo Molnar <mingo@redhat.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Paolo Bonzini <pbonzini@redhat.com>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	Alistair Popple <apopple@nvidia.com>,
	kvm@vger.kernel.org, linux-arm-kernel@lists.infradead.org,
	Sean Christopherson <seanjc@google.com>,
	Oscar Salvador <osalvador@suse.de>,
	Jason Gunthorpe <jgg@nvidia.com>, Borislav Petkov <bp@alien8.de>,
	Zi Yan <ziy@nvidia.com>,
	Axel Rasmussen <axelrasmussen@google.com>,
	David Hildenbrand <david@redhat.com>,
	Will Deacon <will@kernel.org>,
	Kefeng Wang <wangkefeng.wang@huawei.com>,
	Alex Williamson <alex.williamson@redhat.com>
Subject: Re: [PATCH v2 07/19] mm/fork: Accept huge pfnmap entries
Date: Tue, 3 Sep 2024 17:23:38 -0400	[thread overview]
Message-ID: <Ztd-WkEoFJGZ34xj@x1n> (raw)
In-Reply-To: <ZtVwLntpS0eJubFq@yzhao56-desk.sh.intel.com>

On Mon, Sep 02, 2024 at 03:58:38PM +0800, Yan Zhao wrote:
> On Mon, Aug 26, 2024 at 04:43:41PM -0400, Peter Xu wrote:
> > Teach the fork code to properly copy pfnmaps for pmd/pud levels.  Pud is
> > much easier, the write bit needs to be persisted though for writable and
> > shared pud mappings like PFNMAP ones, otherwise a follow up write in either
> > parent or child process will trigger a write fault.
> > 
> > Do the same for pmd level.
> > 
> > Signed-off-by: Peter Xu <peterx@redhat.com>
> > ---
> >  mm/huge_memory.c | 29 ++++++++++++++++++++++++++---
> >  1 file changed, 26 insertions(+), 3 deletions(-)
> > 
> > diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> > index e2c314f631f3..15418ffdd377 100644
> > --- a/mm/huge_memory.c
> > +++ b/mm/huge_memory.c
> > @@ -1559,6 +1559,24 @@ int copy_huge_pmd(struct mm_struct *dst_mm, struct mm_struct *src_mm,
> >  	pgtable_t pgtable = NULL;
> >  	int ret = -ENOMEM;
> >  
> > +	pmd = pmdp_get_lockless(src_pmd);
> > +	if (unlikely(pmd_special(pmd))) {
> > +		dst_ptl = pmd_lock(dst_mm, dst_pmd);
> > +		src_ptl = pmd_lockptr(src_mm, src_pmd);
> > +		spin_lock_nested(src_ptl, SINGLE_DEPTH_NESTING);
> > +		/*
> > +		 * No need to recheck the pmd, it can't change with write
> > +		 * mmap lock held here.
> > +		 *
> > +		 * Meanwhile, making sure it's not a CoW VMA with writable
> > +		 * mapping, otherwise it means either the anon page wrongly
> > +		 * applied special bit, or we made the PRIVATE mapping be
> > +		 * able to wrongly write to the backend MMIO.
> > +		 */
> > +		VM_WARN_ON_ONCE(is_cow_mapping(src_vma->vm_flags) && pmd_write(pmd));
> > +		goto set_pmd;
> > +	}
> > +
> >  	/* Skip if can be re-fill on fault */
> >  	if (!vma_is_anonymous(dst_vma))
> >  		return 0;
> > @@ -1640,7 +1658,9 @@ int copy_huge_pmd(struct mm_struct *dst_mm, struct mm_struct *src_mm,
> >  	pmdp_set_wrprotect(src_mm, addr, src_pmd);
> >  	if (!userfaultfd_wp(dst_vma))
> >  		pmd = pmd_clear_uffd_wp(pmd);
> > -	pmd = pmd_mkold(pmd_wrprotect(pmd));
> > +	pmd = pmd_wrprotect(pmd);
> > +set_pmd:
> > +	pmd = pmd_mkold(pmd);
> >  	set_pmd_at(dst_mm, addr, dst_pmd, pmd);
> >  
> >  	ret = 0;
> > @@ -1686,8 +1706,11 @@ int copy_huge_pud(struct mm_struct *dst_mm, struct mm_struct *src_mm,
> >  	 * TODO: once we support anonymous pages, use
> >  	 * folio_try_dup_anon_rmap_*() and split if duplicating fails.
> >  	 */
> > -	pudp_set_wrprotect(src_mm, addr, src_pud);
> > -	pud = pud_mkold(pud_wrprotect(pud));
> > +	if (is_cow_mapping(vma->vm_flags) && pud_write(pud)) {
> > +		pudp_set_wrprotect(src_mm, addr, src_pud);
> > +		pud = pud_wrprotect(pud);
> > +	}
> Do we need the logic to clear dirty bit in the child as that in
> __copy_present_ptes()?  (and also for the pmd's case).
> 
> e.g.
> if (vma->vm_flags & VM_SHARED)
> 	pud = pud_mkclean(pud);

Yeah, good question.  I remember I thought about that when initially
working on these lines, but I forgot the details, or maybe I simply tried
to stick with the current code base, as the dirty bit used to be kept even
in the child here.

I'd expect there's only performance differences, but still sounds like I'd
better leave that to whoever knows the best on the implications, then draft
it as a separate patch but only when needed.

Thanks,

-- 
Peter Xu



  reply	other threads:[~2024-09-03 21:23 UTC|newest]

Thread overview: 69+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-08-26 20:43 [PATCH v2 00/19] mm: Support huge pfnmaps Peter Xu
2024-08-26 20:43 ` [PATCH v2 01/19] mm: Introduce ARCH_SUPPORTS_HUGE_PFNMAP and special bits to pmd/pud Peter Xu
2024-08-26 20:43 ` [PATCH v2 02/19] mm: Drop is_huge_zero_pud() Peter Xu
2024-08-26 20:43 ` [PATCH v2 03/19] mm: Mark special bits for huge pfn mappings when inject Peter Xu
2024-08-28 15:31   ` David Hildenbrand
2024-08-26 20:43 ` [PATCH v2 04/19] mm: Allow THP orders for PFNMAPs Peter Xu
2024-08-28 15:31   ` David Hildenbrand
2024-08-26 20:43 ` [PATCH v2 05/19] mm/gup: Detect huge pfnmap entries in gup-fast Peter Xu
2024-08-26 20:43 ` [PATCH v2 06/19] mm/pagewalk: Check pfnmap for folio_walk_start() Peter Xu
2024-08-28  7:44   ` David Hildenbrand
2024-08-28 14:24     ` Peter Xu
2024-08-28 15:30       ` David Hildenbrand
2024-08-28 19:45         ` Peter Xu
2024-08-28 23:46           ` Jason Gunthorpe
2024-08-29  6:35             ` David Hildenbrand
2024-08-29 18:45               ` Peter Xu
2024-08-29 15:10           ` David Hildenbrand
2024-08-29 18:49             ` Peter Xu
2024-08-26 20:43 ` [PATCH v2 07/19] mm/fork: Accept huge pfnmap entries Peter Xu
2024-08-29 15:10   ` David Hildenbrand
2024-08-29 18:26     ` Peter Xu
2024-08-29 19:44       ` David Hildenbrand
2024-08-29 20:01         ` Peter Xu
2024-09-02  7:58   ` Yan Zhao
2024-09-03 21:23     ` Peter Xu [this message]
2024-09-09 22:25       ` Andrew Morton
2024-09-09 22:43         ` Peter Xu
2024-09-09 23:15           ` Andrew Morton
2024-09-10  0:08             ` Peter Xu
2024-09-10  2:52               ` Yan Zhao
2024-09-10 12:16                 ` Peter Xu
2024-09-11  2:16                   ` Yan Zhao
2024-09-11 14:34                     ` Peter Xu
2024-08-26 20:43 ` [PATCH v2 08/19] mm: Always define pxx_pgprot() Peter Xu
2024-08-26 20:43 ` [PATCH v2 09/19] mm: New follow_pfnmap API Peter Xu
2024-08-26 20:43 ` [PATCH v2 10/19] KVM: Use " Peter Xu
2024-08-26 20:43 ` [PATCH v2 11/19] s390/pci_mmio: " Peter Xu
2024-08-26 20:43 ` [PATCH v2 12/19] mm/x86/pat: Use the new " Peter Xu
2024-08-26 20:43 ` [PATCH v2 13/19] vfio: " Peter Xu
2024-08-26 20:43 ` [PATCH v2 14/19] acrn: " Peter Xu
2024-08-26 20:43 ` [PATCH v2 15/19] mm/access_process_vm: " Peter Xu
2024-08-26 20:43 ` [PATCH v2 16/19] mm: Remove follow_pte() Peter Xu
2024-09-01  4:33   ` Yu Zhao
2024-09-01 13:39     ` David Hildenbrand
2024-08-26 20:43 ` [PATCH v2 17/19] mm/x86: Support large pfn mappings Peter Xu
2024-08-26 20:43 ` [PATCH v2 18/19] mm/arm64: " Peter Xu
2025-03-19 22:22   ` Keith Busch
2025-03-19 22:46     ` Peter Xu
2025-03-19 22:53       ` Keith Busch
2024-08-26 20:43 ` [PATCH v2 19/19] vfio/pci: Implement huge_fault support Peter Xu
2024-08-27 22:36 ` [PATCH v2 00/19] mm: Support huge pfnmaps Jiaqi Yan
2024-08-27 22:57   ` Peter Xu
2024-08-28  0:42     ` Jiaqi Yan
2024-08-28  0:46       ` Jiaqi Yan
2024-08-28 14:24       ` Jason Gunthorpe
2024-08-28 16:10         ` Jiaqi Yan
2024-08-28 23:49           ` Jason Gunthorpe
2024-08-29 19:21             ` Jiaqi Yan
2024-09-04 15:52               ` Jason Gunthorpe
2024-09-04 16:38                 ` Jiaqi Yan
2024-09-04 16:43                   ` Jason Gunthorpe
2024-09-04 16:58                     ` Jiaqi Yan
2024-09-04 17:00                       ` Jason Gunthorpe
2024-09-04 17:07                         ` Jiaqi Yan
2024-09-09  3:56                           ` Ankit Agrawal
2024-08-28 14:41       ` Peter Xu
2024-08-28 16:23         ` Jiaqi Yan
2024-09-09  4:03 ` Ankit Agrawal
2024-09-09 15:03   ` Peter Xu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Ztd-WkEoFJGZ34xj@x1n \
    --to=peterx@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=alex.williamson@redhat.com \
    --cc=apopple@nvidia.com \
    --cc=axelrasmussen@google.com \
    --cc=bp@alien8.de \
    --cc=catalin.marinas@arm.com \
    --cc=dave.hansen@linux.intel.com \
    --cc=david@redhat.com \
    --cc=gshan@redhat.com \
    --cc=jgg@nvidia.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mingo@redhat.com \
    --cc=osalvador@suse.de \
    --cc=pbonzini@redhat.com \
    --cc=seanjc@google.com \
    --cc=tglx@linutronix.de \
    --cc=wangkefeng.wang@huawei.com \
    --cc=will@kernel.org \
    --cc=x86@kernel.org \
    --cc=yan.y.zhao@intel.com \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox