linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: "David Hildenbrand (Arm)" <david@kernel.org>
To: Yin Tirui <yintirui@gmail.com>, Lorenzo Stoakes <ljs@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Zi Yan <ziy@nvidia.com>,
	Baolin Wang <baolin.wang@linux.alibaba.com>,
	"Liam R . Howlett" <Liam.Howlett@oracle.com>,
	Nico Pache <npache@redhat.com>,
	Ryan Roberts <ryan.roberts@arm.com>, Dev Jain <dev.jain@arm.com>,
	Barry Song <baohua@kernel.org>, Lance Yang <lance.yang@linux.dev>,
	Vlastimil Babka <vbabka@kernel.org>,
	Mike Rapoport <rppt@kernel.org>,
	Suren Baghdasaryan <surenb@google.com>,
	Michal Hocko <mhocko@suse.com>, Kiryl Shutsemau <kas@kernel.org>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH v3 13/13] mm/huge_memory: add and use has_deposited_pgtable()
Date: Tue, 14 Apr 2026 11:44:16 +0200	[thread overview]
Message-ID: <53d748d3-4150-4e7b-8c1f-4c58587e9183@kernel.org> (raw)
In-Reply-To: <2f29f66b-46db-4925-b922-4add61b633bf@gmail.com>

On 4/14/26 09:36, Yin Tirui wrote:
> Hi Lorenzo and David,
> 
> Sorry for the late reply.
> 
> On 4/7/26 18:48, Lorenzo Stoakes wrote:
>> On Thu, Apr 02, 2026 at 03:49:35PM +0800, Yin Tirui wrote:
>>>
>>>
>>>
>>> Hi Lorenzo,
>>>
>>> Thanks for the quick reply. I will definitely CC you on the v4 series.
>>
>> Thanks.
>>
>>>
>>>
>>> Here is the dilemma:
>>>
>>> Currently, VFIO uses vmf_insert_pfn_pmd() to create huge pfnmaps on page
>>> faults. This sets VM_PFNMAP in vfio_pci_core_mmap(), but it does not
>>> deposit a pgtable (unless arch_needs_pgtable_deposit() is true).
>>
>> Hmmm... it's only the VFIO and hyperv drivers using this.
>>
>> Wouldn't we generally want a deposited huge page here now we're allowing huge
>> PFN maps?
>>
>> Or are this _special cases_ where we have a PMD-sized entry but are not
>> necessarily wanting to treat it as THP?
>>
>> This is a real wrinkle in this whole series no?
>>
>> David - any thoughts?

Sorry, catching up with that now.

>>
>>>
>>> To resolve this,
>>>
>>> Option A: Force VFIO (vmf_insert_pfn_pmd) to also deposit pgtables. This
>>> unifies the VM_PFNMAP lifecycle. However, since VFIO can refault,
>>> depositing pgtables here incurs unnecessary memory overhead.
>>
>> How can VFIO refault as a PFN mapping? Does it intentionally sometimes
>> clear PTE entries to effect a refault, and implement a custom fault
>> handler?
>>
>> I guess having a fault handler makes it refaultable...
>>
>> I mean obviously that then contradicts the suggested comment above :)
>>
>> That seems to me to cast a bit of a question over the whole series - having
>> PMD mappings that are _sometimes_ THP and _sometimes_ not is weird (TM).
>>
>> And it'd suck to add - yet another very specific check - to determine if we
>> do, in fact, assume THP for a PMD sized PFN map.
> 
> Yes, exactly. VFIO and Hyper-V rely on their custom `.fault` handlers to
> dynamically build mappings. In contrast, `remap_pfn_range()` establishes
> static pre-mappings.
> 
>>
>>>
>>> Option B: Introduce a new VMA flag set during remap_pfn_range(), which
>>> we can explicitly check in has_deposited_pgtable().
>>
>> Yeah would rather not, that feels like a hack.
> 
> Agreed.
> 
>>
>>>
>>> Option C: Check vma->vm_ops->fault (and huge_fault). We would only
>>> deposit pgtables for mappings without fault handlers. However, this is
>>> fragile because a driver might still register a .fault() handler that
>>> simply returns VM_FAULT_SIGBUS.
>>
>> I mean again this is yet another check (TM). But probably the most preferable I
>> think.
>>
>> Wouldn't a driver doing that be being somewhat redundant? E.g. in do_fault();
>>
>> 	if (!vma->vm_ops->fault) {
>> 		vmf->pte = pte_offset_map_lock(vmf->vma->vm_mm, vmf->pmd,
>> 					       vmf->address, &vmf->ptl);
>> 		if (unlikely(!vmf->pte))
>> 			ret = VM_FAULT_SIGBUS;
>>
>> And so can expect maybe some more redundancy if they also happen to map
>> PMD-sized ranges? :)
>>
>> And the only two callers of vmf_insert_pfn_pmd() - hyperv and VFIO both
>> implement actual fault handlers anyway.
>>
>> So I think this is fine?
>>
> 
> I agree.
> 
> David, since Lorenzo also asked for your thoughts on the overall design
> aspect ("sometimes THP and sometimes not"), what is your opinion on
> this? Should we proceed with checking `!vma->vm_ops->fault` to
> differentiate the deposit behavior for huge PFNMAPs?

I mean, we need some indication to know also during folio splitting
whether we can just discard the PMD, as we can refault it later, or
whether we really have to install a PTE table.

What if someone used remap_pfn_range() on some part of the VMA, and
faults on another part?

Doesn't really work.

Do we have users of remap_pfn_range() that have ->fault set? If not, we
should probably just disallow this combination.

Then we know for sure whether something was installed through
remap_pfn_range() or through a fault handler.

-- 
Cheers,

David


  reply	other threads:[~2026-04-14  9:44 UTC|newest]

Thread overview: 47+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-20 18:14 [PATCH v3 00/13] mm/huge_memory: refactor zap_huge_pmd() Lorenzo Stoakes (Oracle)
2026-03-20 18:07 ` [PATCH v3 01/13] mm/huge_memory: simplify vma_is_specal_huge() Lorenzo Stoakes (Oracle)
2026-03-28 18:49   ` Suren Baghdasaryan
2026-03-20 18:07 ` [PATCH v3 02/13] mm/huge: avoid big else branch in zap_huge_pmd() Lorenzo Stoakes (Oracle)
2026-03-28 18:52   ` Suren Baghdasaryan
2026-03-20 18:07 ` [PATCH v3 03/13] mm/huge_memory: have zap_huge_pmd return a boolean, add kdoc Lorenzo Stoakes (Oracle)
2026-03-28 18:54   ` Suren Baghdasaryan
2026-03-20 18:07 ` [PATCH v3 04/13] mm/huge_memory: handle buggy PMD entry in zap_huge_pmd() Lorenzo Stoakes (Oracle)
2026-03-28 19:05   ` Suren Baghdasaryan
2026-03-30 10:08     ` Lorenzo Stoakes (Oracle)
2026-03-20 18:07 ` [PATCH v3 05/13] mm/huge_memory: add a common exit path to zap_huge_pmd() Lorenzo Stoakes (Oracle)
2026-03-28 19:08   ` Suren Baghdasaryan
2026-03-20 18:07 ` [PATCH v3 06/13] mm/huge_memory: remove unnecessary VM_BUG_ON_PAGE() Lorenzo Stoakes (Oracle)
2026-03-28 19:09   ` Suren Baghdasaryan
2026-03-20 18:07 ` [PATCH v3 07/13] mm/huge_memory: deduplicate zap deposited table call Lorenzo Stoakes (Oracle)
2026-03-21  5:39   ` Baolin Wang
2026-03-28 19:14   ` Suren Baghdasaryan
2026-03-20 18:07 ` [PATCH v3 08/13] mm/huge_memory: remove unnecessary sanity checks Lorenzo Stoakes (Oracle)
2026-03-28 19:17   ` Suren Baghdasaryan
2026-03-20 18:07 ` [PATCH v3 09/13] mm/huge_memory: use mm instead of tlb->mm Lorenzo Stoakes (Oracle)
2026-03-21  5:42   ` Baolin Wang
2026-03-28 19:18     ` Suren Baghdasaryan
2026-03-20 18:07 ` [PATCH v3 10/13] mm/huge_memory: separate out the folio part of zap_huge_pmd() Lorenzo Stoakes (Oracle)
2026-03-21  5:59   ` Baolin Wang
2026-03-23 10:42     ` Lorenzo Stoakes (Oracle)
2026-03-24 12:42       ` Baolin Wang
2026-03-28 19:20     ` Suren Baghdasaryan
2026-03-20 18:07 ` [PATCH v3 11/13] mm: add softleaf_is_valid_pmd_entry(), pmd_to_softleaf_folio() Lorenzo Stoakes (Oracle)
2026-03-28 19:28   ` Suren Baghdasaryan
2026-03-20 18:07 ` [PATCH v3 12/13] mm/huge_memory: add and use normal_or_softleaf_folio_pmd() Lorenzo Stoakes (Oracle)
2026-03-23 11:24   ` Lorenzo Stoakes (Oracle)
2026-03-28 19:45   ` Suren Baghdasaryan
2026-03-30  9:48     ` Lorenzo Stoakes (Oracle)
2026-03-20 18:07 ` [PATCH v3 13/13] mm/huge_memory: add and use has_deposited_pgtable() Lorenzo Stoakes (Oracle)
2026-03-23 11:45   ` Lorenzo Stoakes (Oracle)
2026-03-23 12:25     ` Lorenzo Stoakes (Oracle)
2026-03-28 19:54       ` Suren Baghdasaryan
2026-03-30  9:54         ` Lorenzo Stoakes (Oracle)
2026-04-02  3:19   ` Yin Tirui
2026-04-02  6:46     ` Lorenzo Stoakes (Oracle)
2026-04-02  7:49       ` Yin Tirui
2026-04-07 10:48         ` Lorenzo Stoakes
2026-04-14  7:36           ` Yin Tirui
2026-04-14  9:44             ` David Hildenbrand (Arm) [this message]
2026-04-14 15:14               ` Yin Tirui
2026-03-20 18:42 ` [PATCH v3 00/13] mm/huge_memory: refactor zap_huge_pmd() Andrew Morton
2026-03-23 12:08 ` Lorenzo Stoakes (Oracle)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=53d748d3-4150-4e7b-8c1f-4c58587e9183@kernel.org \
    --to=david@kernel.org \
    --cc=Liam.Howlett@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=baohua@kernel.org \
    --cc=baolin.wang@linux.alibaba.com \
    --cc=dev.jain@arm.com \
    --cc=kas@kernel.org \
    --cc=lance.yang@linux.dev \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=ljs@kernel.org \
    --cc=mhocko@suse.com \
    --cc=npache@redhat.com \
    --cc=rppt@kernel.org \
    --cc=ryan.roberts@arm.com \
    --cc=surenb@google.com \
    --cc=vbabka@kernel.org \
    --cc=yintirui@gmail.com \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox