From: Yin Tirui <yintirui@gmail.com>
To: "Lorenzo Stoakes (Oracle)" <ljs@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>,
David Hildenbrand <david@kernel.org>, Zi Yan <ziy@nvidia.com>,
Baolin Wang <baolin.wang@linux.alibaba.com>,
"Liam R . Howlett" <Liam.Howlett@oracle.com>,
Nico Pache <npache@redhat.com>,
Ryan Roberts <ryan.roberts@arm.com>, Dev Jain <dev.jain@arm.com>,
Barry Song <baohua@kernel.org>, Lance Yang <lance.yang@linux.dev>,
Vlastimil Babka <vbabka@kernel.org>,
Mike Rapoport <rppt@kernel.org>,
Suren Baghdasaryan <surenb@google.com>,
Michal Hocko <mhocko@suse.com>, Kiryl Shutsemau <kas@kernel.org>,
linux-mm@kvack.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH v3 13/13] mm/huge_memory: add and use has_deposited_pgtable()
Date: Thu, 2 Apr 2026 15:49:35 +0800 [thread overview]
Message-ID: <c6748ac8-6dcf-41b5-a281-2f21069b1f2e@gmail.com> (raw)
In-Reply-To: <fe357495-2e45-443a-93b4-52bf21d3bdff@lucifer.local>
On 4/2/26 14:46, Lorenzo Stoakes (Oracle) wrote:
>
> I mean you would have needed to handle this case in any event, since this change
> is strictly an equivalent reworking of zap_huge_pmd().
>
> But it seems that doing so has clarified the requirements somewhat here :)
>
> I haven't had a look at that series yet (please cc this email if you weren't
> already, I do filter a lot of stuff due to how much mail I get daily)
Hi Lorenzo,
Thanks for the quick reply. I will definitely CC you on the v4 series.
>
> So if this is a PMD leaf entry it will be present and PFN map, so I'd have
> thought simply adding:
>
> /* Huge PFN map must deposit, as cannot refault. */
> if (vma_test(vma, VMA_PFNMAP_BIT))
> return true;
>
> Would suffice?
Here is the dilemma:
Currently, VFIO uses vmf_insert_pfn_pmd() to create huge pfnmaps on page
faults. This sets VM_PFNMAP in vfio_pci_core_mmap(), but it does not
deposit a pgtable (unless arch_needs_pgtable_deposit() is true).
To resolve this,
Option A: Force VFIO (vmf_insert_pfn_pmd) to also deposit pgtables. This
unifies the VM_PFNMAP lifecycle. However, since VFIO can refault,
depositing pgtables here incurs unnecessary memory overhead.
Option B: Introduce a new VMA flag set during remap_pfn_range(), which
we can explicitly check in has_deposited_pgtable().
Option C: Check vma->vm_ops->fault (and huge_fault). We would only
deposit pgtables for mappings without fault handlers. However, this is
fragile because a driver might still register a .fault() handler that
simply returns VM_FAULT_SIGBUS.
Do you have a preference among these, or perhaps another idea?
>
> By the way, I am wondering if the prot bits are correctly preserved on page
> table deposit, as this is key for pfn map (e.g. if the range is uncached, for
> instance). That's something to check and ensure is correct.
>
> I _suspect_ they will be, as we have pretty well established mechanisms for that
> (propagate vma->vm_page_prot etc.) but definitely worth making sure.
>
Yes, they are correctly preserved!
During a PMD split in __split_huge_pmd_locked(), we populate the
deposited pgtable like this:
entry = pfn_pte(pmd_pfn(old_pmd), pmd_pgprot(old_pmd));
set_ptes(mm, haddr, pte, entry, HPAGE_PMD_NR);
The newly refactored pmd_pgprot() correctly extracts the exact
protection bits (including crucial cache modes like UC/WC for device
memory) from the huge PMD, strips the hardware-specific huge bit, and
returns a pure PTE-level pgprot_t.
>>
>> [1]
>> https://lore.kernel.org/linux-mm/20260228070906.1418911-5-yintirui@huawei.com/
>>
>> --
>> Yin Tirui
>>
>
> Cheers, Lorenzo
--
Yin Tirui
next prev parent reply other threads:[~2026-04-02 7:49 UTC|newest]
Thread overview: 44+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-03-20 18:14 [PATCH v3 00/13] mm/huge_memory: refactor zap_huge_pmd() Lorenzo Stoakes (Oracle)
2026-03-20 18:07 ` [PATCH v3 01/13] mm/huge_memory: simplify vma_is_specal_huge() Lorenzo Stoakes (Oracle)
2026-03-28 18:49 ` Suren Baghdasaryan
2026-03-20 18:07 ` [PATCH v3 02/13] mm/huge: avoid big else branch in zap_huge_pmd() Lorenzo Stoakes (Oracle)
2026-03-28 18:52 ` Suren Baghdasaryan
2026-03-20 18:07 ` [PATCH v3 03/13] mm/huge_memory: have zap_huge_pmd return a boolean, add kdoc Lorenzo Stoakes (Oracle)
2026-03-28 18:54 ` Suren Baghdasaryan
2026-03-20 18:07 ` [PATCH v3 04/13] mm/huge_memory: handle buggy PMD entry in zap_huge_pmd() Lorenzo Stoakes (Oracle)
2026-03-28 19:05 ` Suren Baghdasaryan
2026-03-30 10:08 ` Lorenzo Stoakes (Oracle)
2026-03-20 18:07 ` [PATCH v3 05/13] mm/huge_memory: add a common exit path to zap_huge_pmd() Lorenzo Stoakes (Oracle)
2026-03-28 19:08 ` Suren Baghdasaryan
2026-03-20 18:07 ` [PATCH v3 06/13] mm/huge_memory: remove unnecessary VM_BUG_ON_PAGE() Lorenzo Stoakes (Oracle)
2026-03-28 19:09 ` Suren Baghdasaryan
2026-03-20 18:07 ` [PATCH v3 07/13] mm/huge_memory: deduplicate zap deposited table call Lorenzo Stoakes (Oracle)
2026-03-21 5:39 ` Baolin Wang
2026-03-28 19:14 ` Suren Baghdasaryan
2026-03-20 18:07 ` [PATCH v3 08/13] mm/huge_memory: remove unnecessary sanity checks Lorenzo Stoakes (Oracle)
2026-03-28 19:17 ` Suren Baghdasaryan
2026-03-20 18:07 ` [PATCH v3 09/13] mm/huge_memory: use mm instead of tlb->mm Lorenzo Stoakes (Oracle)
2026-03-21 5:42 ` Baolin Wang
2026-03-28 19:18 ` Suren Baghdasaryan
2026-03-20 18:07 ` [PATCH v3 10/13] mm/huge_memory: separate out the folio part of zap_huge_pmd() Lorenzo Stoakes (Oracle)
2026-03-21 5:59 ` Baolin Wang
2026-03-23 10:42 ` Lorenzo Stoakes (Oracle)
2026-03-24 12:42 ` Baolin Wang
2026-03-28 19:20 ` Suren Baghdasaryan
2026-03-20 18:07 ` [PATCH v3 11/13] mm: add softleaf_is_valid_pmd_entry(), pmd_to_softleaf_folio() Lorenzo Stoakes (Oracle)
2026-03-28 19:28 ` Suren Baghdasaryan
2026-03-20 18:07 ` [PATCH v3 12/13] mm/huge_memory: add and use normal_or_softleaf_folio_pmd() Lorenzo Stoakes (Oracle)
2026-03-23 11:24 ` Lorenzo Stoakes (Oracle)
2026-03-28 19:45 ` Suren Baghdasaryan
2026-03-30 9:48 ` Lorenzo Stoakes (Oracle)
2026-03-20 18:07 ` [PATCH v3 13/13] mm/huge_memory: add and use has_deposited_pgtable() Lorenzo Stoakes (Oracle)
2026-03-23 11:45 ` Lorenzo Stoakes (Oracle)
2026-03-23 12:25 ` Lorenzo Stoakes (Oracle)
2026-03-28 19:54 ` Suren Baghdasaryan
2026-03-30 9:54 ` Lorenzo Stoakes (Oracle)
2026-04-02 3:19 ` Yin Tirui
2026-04-02 6:46 ` Lorenzo Stoakes (Oracle)
2026-04-02 7:49 ` Yin Tirui [this message]
2026-04-07 10:48 ` Lorenzo Stoakes
2026-03-20 18:42 ` [PATCH v3 00/13] mm/huge_memory: refactor zap_huge_pmd() Andrew Morton
2026-03-23 12:08 ` Lorenzo Stoakes (Oracle)
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=c6748ac8-6dcf-41b5-a281-2f21069b1f2e@gmail.com \
--to=yintirui@gmail.com \
--cc=Liam.Howlett@oracle.com \
--cc=akpm@linux-foundation.org \
--cc=baohua@kernel.org \
--cc=baolin.wang@linux.alibaba.com \
--cc=david@kernel.org \
--cc=dev.jain@arm.com \
--cc=kas@kernel.org \
--cc=lance.yang@linux.dev \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=ljs@kernel.org \
--cc=mhocko@suse.com \
--cc=npache@redhat.com \
--cc=rppt@kernel.org \
--cc=ryan.roberts@arm.com \
--cc=surenb@google.com \
--cc=vbabka@kernel.org \
--cc=ziy@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox