Re: [RFC 1/2] mm: thp: allocate PTE page tables lazily at split time

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Usama Arif <usama.arif@linux.dev>
To: "David Hildenbrand (Arm)" <david@kernel.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	lorenzo.stoakes@oracle.com, willy@infradead.org,
	linux-mm@kvack.org
Cc: fvdl@google.com, hannes@cmpxchg.org, riel@surriel.com,
	shakeel.butt@linux.dev, kas@kernel.org, baohua@kernel.org,
	dev.jain@arm.com, baolin.wang@linux.alibaba.com,
	npache@redhat.com, Liam.Howlett@oracle.com, ryan.roberts@arm.com,
	vbabka@suse.cz, lance.yang@linux.dev,
	linux-kernel@vger.kernel.org, kernel-team@meta.com
Subject: Re: [RFC 1/2] mm: thp: allocate PTE page tables lazily at split time
Date: Wed, 11 Feb 2026 13:47:16 +0000	[thread overview]
Message-ID: <82d2de0a-c321-4254-aca0-7d63abd4f1af@linux.dev> (raw)
In-Reply-To: <66386da6-6a7c-4968-9167-71f99dd498ad@kernel.org>



On 11/02/2026 13:35, David Hildenbrand (Arm) wrote:
> On 2/11/26 13:49, Usama Arif wrote:
>> When the kernel creates a PMD-level THP mapping for anonymous pages,
>> it pre-allocates a PTE page table and deposits it via
>> pgtable_trans_huge_deposit(). This deposited table is withdrawn during
>> PMD split or zap. The rationale was that split must not fail—if the
>> kernel decides to split a THP, it needs a PTE table to populate.
>>
>> However, every anon THP wastes 4KB (one page table page) that sits
>> unused in the deposit list for the lifetime of the mapping. On systems
>> with many THPs, this adds up to significant memory waste. The original
>> rationale is also not an issue. It is ok for split to fail, and if the
>> kernel can't find an order 0 allocation for split, there are much bigger
>> problems. On large servers where you can easily have 100s of GBs of THPs,
>> the memory usage for these tables is 200M per 100G. This memory could be
>> used for any other usecase, which include allocating the pagetables
>> required during split.
>>
>> This patch removes the pre-deposit for anonymous pages on architectures
>> where arch_needs_pgtable_deposit() returns false (every arch apart from
>> powerpc, and only when radix hash tables are not enabled) and allocates
>> the PTE table lazily—only when a split actually occurs. The split path
>> is modified to accept a caller-provided page table.
>>
>> PowerPC exception:
>>
>> It would have been great if we can completely remove the pagetable
>> deposit code and this commit would mostly have been a code cleanup patch,
>> unfortunately PowerPC has hash MMU, it stores hash slot information in
>> the deposited page table and pre-deposit is necessary. All deposit/
>> withdraw paths are guarded by arch_needs_pgtable_deposit(), so PowerPC
>> behavior is unchanged with this patch. On a better note,
>> arch_needs_pgtable_deposit will always evaluate to false at compile time
>> on non PowerPC architectures and the pre-deposit code will not be
>> compiled in.
>>
>> Why Split Failures Are Safe:
>>
>> If a system is under severe memory pressure that even a 4K allocation
>> fails for a PTE table, there are far greater problems than a THP split
>> being delayed. The OOM killer will likely intervene before this becomes an
>> issue.
>> When pte_alloc_one() fails due to not being able to allocate a 4K page,
>> the PMD split is aborted and the THP remains intact. I could not get split
>> to fail, as its very difficult to make order-0 allocation to fail.
>> Code analysis of what would happen if it does:
>>
>> - mprotect(): If split fails in change_pmd_range, it will fallback
>> to change_pte_range, which will return an error which will cause the
>> whole function to be retried again.
>>
>> - munmap() (partial THP range): zap_pte_range() returns early when
>> pte_offset_map_lock() fails, causing zap_pmd_range() to retry via pmd--.
>> For full THP range, zap_huge_pmd() unmaps the entire PMD without
>> split.
>>
>> - Memory reclaim (try_to_unmap()): Returns false, folio rotated back
>> LRU, retried in next reclaim cycle.
>>
>> - Migration / compaction (try_to_migrate()): Returns -EAGAIN, migration
>> skips this folio, retried later.
>>
>> - CoW fault (wp_huge_pmd()): Returns VM_FAULT_FALLBACK, fault retried.
>>
>> -  madvise (MADV_COLD/PAGEOUT): split_folio() internally calls
>> try_to_migrate() with TTU_SPLIT_HUGE_PMD. If PMD split fails,
>> try_to_migrate() returns false, split_folio() returns -EAGAIN,
>> and madvise returns 0 (success) silently skipping the region. This
>> should be fine. madvise is just an advice and can fail for other
>> reasons as well.
>>
>> Suggested-by: David Hildenbrand <david@kernel.org>
>> Signed-off-by: Usama Arif <usama.arif@linux.dev>
>> ---
>>   include/linux/huge_mm.h |   4 +-
>>   mm/huge_memory.c        | 144 ++++++++++++++++++++++++++++------------
>>   mm/khugepaged.c         |   7 +-
>>   mm/migrate_device.c     |  15 +++--
>>   mm/rmap.c               |  39 ++++++++++-
>>   5 files changed, 156 insertions(+), 53 deletions(-)
>>
>> diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
>> index a4d9f964dfdea..b21bb72a298c9 100644
>> --- a/include/linux/huge_mm.h
>> +++ b/include/linux/huge_mm.h
>> @@ -562,7 +562,7 @@ static inline bool thp_migration_supported(void)
>>   }
>>     void split_huge_pmd_locked(struct vm_area_struct *vma, unsigned long address,
>> -               pmd_t *pmd, bool freeze);
>> +               pmd_t *pmd, bool freeze, pgtable_t pgtable);
>>   bool unmap_huge_pmd_locked(struct vm_area_struct *vma, unsigned long addr,
>>                  pmd_t *pmdp, struct folio *folio);
>>   void map_anon_folio_pmd_nopf(struct folio *folio, pmd_t *pmd,
>> @@ -660,7 +660,7 @@ static inline void split_huge_pmd_address(struct vm_area_struct *vma,
>>           unsigned long address, bool freeze) {}
>>   static inline void split_huge_pmd_locked(struct vm_area_struct *vma,
>>                        unsigned long address, pmd_t *pmd,
>> -                     bool freeze) {}
>> +                     bool freeze, pgtable_t pgtable) {}
>>     static inline bool unmap_huge_pmd_locked(struct vm_area_struct *vma,
>>                        unsigned long addr, pmd_t *pmdp,
>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
>> index 44ff8a648afd5..4c9a8d89fc8aa 100644
>> --- a/mm/huge_memory.c
>> +++ b/mm/huge_memory.c
>> @@ -1322,17 +1322,19 @@ static vm_fault_t __do_huge_pmd_anonymous_page(struct vm_fault *vmf)
>>       unsigned long haddr = vmf->address & HPAGE_PMD_MASK;
>>       struct vm_area_struct *vma = vmf->vma;
>>       struct folio *folio;
>> -    pgtable_t pgtable;
>> +    pgtable_t pgtable = NULL;
>>       vm_fault_t ret = 0;
>>         folio = vma_alloc_anon_folio_pmd(vma, vmf->address);
>>       if (unlikely(!folio))
>>           return VM_FAULT_FALLBACK;
>>   -    pgtable = pte_alloc_one(vma->vm_mm);
>> -    if (unlikely(!pgtable)) {
>> -        ret = VM_FAULT_OOM;
>> -        goto release;
>> +    if (arch_needs_pgtable_deposit()) {
>> +        pgtable = pte_alloc_one(vma->vm_mm);
>> +        if (unlikely(!pgtable)) {
>> +            ret = VM_FAULT_OOM;
>> +            goto release;
>> +        }
>>       }
>>         vmf->ptl = pmd_lock(vma->vm_mm, vmf->pmd);
>> @@ -1347,14 +1349,18 @@ static vm_fault_t __do_huge_pmd_anonymous_page(struct vm_fault *vmf)
>>           if (userfaultfd_missing(vma)) {
>>               spin_unlock(vmf->ptl);
>>               folio_put(folio);
>> -            pte_free(vma->vm_mm, pgtable);
>> +            if (pgtable)
>> +                pte_free(vma->vm_mm, pgtable);
>>               ret = handle_userfault(vmf, VM_UFFD_MISSING);
>>               VM_BUG_ON(ret & VM_FAULT_FALLBACK);
>>               return ret;
>>           }
>> -        pgtable_trans_huge_deposit(vma->vm_mm, vmf->pmd, pgtable);
>> +        if (pgtable) {
>> +            pgtable_trans_huge_deposit(vma->vm_mm, vmf->pmd,
>> +                           pgtable);
>> +            mm_inc_nr_ptes(vma->vm_mm);
>> +        }
>>           map_anon_folio_pmd_pf(folio, vmf->pmd, vma, haddr);
>> -        mm_inc_nr_ptes(vma->vm_mm);
>>           spin_unlock(vmf->ptl);
>>       }
>>   @@ -1450,9 +1456,11 @@ static void set_huge_zero_folio(pgtable_t pgtable, struct mm_struct *mm,
>>       pmd_t entry;
>>       entry = folio_mk_pmd(zero_folio, vma->vm_page_prot);
>>       entry = pmd_mkspecial(entry);
>> -    pgtable_trans_huge_deposit(mm, pmd, pgtable);
>> +    if (pgtable) {
>> +        pgtable_trans_huge_deposit(mm, pmd, pgtable);
>> +        mm_inc_nr_ptes(mm);
>> +    }
>>       set_pmd_at(mm, haddr, pmd, entry);
>> -    mm_inc_nr_ptes(mm);
>>   }
>>     vm_fault_t do_huge_pmd_anonymous_page(struct vm_fault *vmf)
>> @@ -1471,16 +1479,19 @@ vm_fault_t do_huge_pmd_anonymous_page(struct vm_fault *vmf)
>>       if (!(vmf->flags & FAULT_FLAG_WRITE) &&
>>               !mm_forbids_zeropage(vma->vm_mm) &&
>>               transparent_hugepage_use_zero_page()) {
>> -        pgtable_t pgtable;
>> +        pgtable_t pgtable = NULL;
>>           struct folio *zero_folio;
>>           vm_fault_t ret;
>>   -        pgtable = pte_alloc_one(vma->vm_mm);
>> -        if (unlikely(!pgtable))
>> -            return VM_FAULT_OOM;
>> +        if (arch_needs_pgtable_deposit()) {
>> +            pgtable = pte_alloc_one(vma->vm_mm);
>> +            if (unlikely(!pgtable))
>> +                return VM_FAULT_OOM;
>> +        }
>>           zero_folio = mm_get_huge_zero_folio(vma->vm_mm);
>>           if (unlikely(!zero_folio)) {
>> -            pte_free(vma->vm_mm, pgtable);
>> +            if (pgtable)
>> +                pte_free(vma->vm_mm, pgtable);
>>               count_vm_event(THP_FAULT_FALLBACK);
>>               return VM_FAULT_FALLBACK;
>>           }
>> @@ -1490,10 +1501,12 @@ vm_fault_t do_huge_pmd_anonymous_page(struct vm_fault *vmf)
>>               ret = check_stable_address_space(vma->vm_mm);
>>               if (ret) {
>>                   spin_unlock(vmf->ptl);
>> -                pte_free(vma->vm_mm, pgtable);
>> +                if (pgtable)
>> +                    pte_free(vma->vm_mm, pgtable);
>>               } else if (userfaultfd_missing(vma)) {
>>                   spin_unlock(vmf->ptl);
>> -                pte_free(vma->vm_mm, pgtable);
>> +                if (pgtable)
>> +                    pte_free(vma->vm_mm, pgtable);
>>                   ret = handle_userfault(vmf, VM_UFFD_MISSING);
>>                   VM_BUG_ON(ret & VM_FAULT_FALLBACK);
>>               } else {
>> @@ -1504,7 +1517,8 @@ vm_fault_t do_huge_pmd_anonymous_page(struct vm_fault *vmf)
>>               }
>>           } else {
>>               spin_unlock(vmf->ptl);
>> -            pte_free(vma->vm_mm, pgtable);
>> +            if (pgtable)
>> +                pte_free(vma->vm_mm, pgtable);
>>           }
>>           return ret;
>>       }
>> @@ -1836,8 +1850,10 @@ static void copy_huge_non_present_pmd(
>>       }
>>         add_mm_counter(dst_mm, MM_ANONPAGES, HPAGE_PMD_NR);
>> -    mm_inc_nr_ptes(dst_mm);
>> -    pgtable_trans_huge_deposit(dst_mm, dst_pmd, pgtable);
>> +    if (pgtable) {
>> +        mm_inc_nr_ptes(dst_mm);
>> +        pgtable_trans_huge_deposit(dst_mm, dst_pmd, pgtable);
>> +    }
>>       if (!userfaultfd_wp(dst_vma))
>>           pmd = pmd_swp_clear_uffd_wp(pmd);
>>       set_pmd_at(dst_mm, addr, dst_pmd, pmd);
>> @@ -1877,9 +1893,11 @@ int copy_huge_pmd(struct mm_struct *dst_mm, struct mm_struct *src_mm,
>>       if (!vma_is_anonymous(dst_vma))
>>           return 0;
>>   -    pgtable = pte_alloc_one(dst_mm);
>> -    if (unlikely(!pgtable))
>> -        goto out;
>> +    if (arch_needs_pgtable_deposit()) {
>> +        pgtable = pte_alloc_one(dst_mm);
>> +        if (unlikely(!pgtable))
>> +            goto out;
>> +    }
>>         dst_ptl = pmd_lock(dst_mm, dst_pmd);
>>       src_ptl = pmd_lockptr(src_mm, src_pmd);
>> @@ -1897,7 +1915,8 @@ int copy_huge_pmd(struct mm_struct *dst_mm, struct mm_struct *src_mm,
>>       }
>>         if (unlikely(!pmd_trans_huge(pmd))) {
>> -        pte_free(dst_mm, pgtable);
>> +        if (pgtable)
>> +            pte_free(dst_mm, pgtable);
>>           goto out_unlock;
>>       }
>>       /*
>> @@ -1923,7 +1942,8 @@ int copy_huge_pmd(struct mm_struct *dst_mm, struct mm_struct *src_mm,
>>       if (unlikely(folio_try_dup_anon_rmap_pmd(src_folio, src_page, dst_vma, src_vma))) {
>>           /* Page maybe pinned: split and retry the fault on PTEs. */
>>           folio_put(src_folio);
>> -        pte_free(dst_mm, pgtable);
>> +        if (pgtable)
>> +            pte_free(dst_mm, pgtable);
>>           spin_unlock(src_ptl);
>>           spin_unlock(dst_ptl);
>>           __split_huge_pmd(src_vma, src_pmd, addr, false);
>> @@ -1931,8 +1951,10 @@ int copy_huge_pmd(struct mm_struct *dst_mm, struct mm_struct *src_mm,
>>       }
>>       add_mm_counter(dst_mm, MM_ANONPAGES, HPAGE_PMD_NR);
>>   out_zero_page:
>> -    mm_inc_nr_ptes(dst_mm);
>> -    pgtable_trans_huge_deposit(dst_mm, dst_pmd, pgtable);
>> +    if (pgtable) {
>> +        mm_inc_nr_ptes(dst_mm);
>> +        pgtable_trans_huge_deposit(dst_mm, dst_pmd, pgtable);
>> +    }
>>       pmdp_set_wrprotect(src_mm, addr, src_pmd);
>>       if (!userfaultfd_wp(dst_vma))
>>           pmd = pmd_clear_uffd_wp(pmd);
>> @@ -2364,7 +2386,7 @@ int zap_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma,
>>               zap_deposited_table(tlb->mm, pmd);
>>           spin_unlock(ptl);
>>       } else if (is_huge_zero_pmd(orig_pmd)) {
>> -        if (!vma_is_dax(vma) || arch_needs_pgtable_deposit())
>> +        if (arch_needs_pgtable_deposit())
>>               zap_deposited_table(tlb->mm, pmd);
>>           spin_unlock(ptl);
>>       } else {
>> @@ -2389,7 +2411,8 @@ int zap_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma,
>>           }
>>             if (folio_test_anon(folio)) {
>> -            zap_deposited_table(tlb->mm, pmd);
>> +            if (arch_needs_pgtable_deposit())
>> +                zap_deposited_table(tlb->mm, pmd);
>>               add_mm_counter(tlb->mm, MM_ANONPAGES, -HPAGE_PMD_NR);
>>           } else {
>>               if (arch_needs_pgtable_deposit())
>> @@ -2490,7 +2513,8 @@ bool move_huge_pmd(struct vm_area_struct *vma, unsigned long old_addr,
>>               force_flush = true;
>>           VM_BUG_ON(!pmd_none(*new_pmd));
>>   -        if (pmd_move_must_withdraw(new_ptl, old_ptl, vma)) {
>> +        if (pmd_move_must_withdraw(new_ptl, old_ptl, vma) &&
>> +            arch_needs_pgtable_deposit()) {
>>               pgtable_t pgtable;
>>               pgtable = pgtable_trans_huge_withdraw(mm, old_pmd);
>>               pgtable_trans_huge_deposit(mm, new_pmd, pgtable);
>> @@ -2798,8 +2822,10 @@ int move_pages_huge_pmd(struct mm_struct *mm, pmd_t *dst_pmd, pmd_t *src_pmd, pm
>>       }
>>       set_pmd_at(mm, dst_addr, dst_pmd, _dst_pmd);
>>   -    src_pgtable = pgtable_trans_huge_withdraw(mm, src_pmd);
>> -    pgtable_trans_huge_deposit(mm, dst_pmd, src_pgtable);
>> +    if (arch_needs_pgtable_deposit()) {
>> +        src_pgtable = pgtable_trans_huge_withdraw(mm, src_pmd);
>> +        pgtable_trans_huge_deposit(mm, dst_pmd, src_pgtable);
>> +    }
>>   unlock_ptls:
>>       double_pt_unlock(src_ptl, dst_ptl);
>>       /* unblock rmap walks */
>> @@ -2941,10 +2967,9 @@ void __split_huge_pud(struct vm_area_struct *vma, pud_t *pud,
>>   #endif /* CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD */
>>     static void __split_huge_zero_page_pmd(struct vm_area_struct *vma,
>> -        unsigned long haddr, pmd_t *pmd)
>> +        unsigned long haddr, pmd_t *pmd, pgtable_t pgtable)
>>   {
>>       struct mm_struct *mm = vma->vm_mm;
>> -    pgtable_t pgtable;
>>       pmd_t _pmd, old_pmd;
>>       unsigned long addr;
>>       pte_t *pte;
>> @@ -2960,7 +2985,16 @@ static void __split_huge_zero_page_pmd(struct vm_area_struct *vma,
>>        */
>>       old_pmd = pmdp_huge_clear_flush(vma, haddr, pmd);
>>   -    pgtable = pgtable_trans_huge_withdraw(mm, pmd);
>> +    if (arch_needs_pgtable_deposit()) {
>> +        pgtable = pgtable_trans_huge_withdraw(mm, pmd);
>> +    } else {
>> +        VM_BUG_ON(!pgtable);
>> +        /*
>> +         * Account for the freshly allocated (in __split_huge_pmd) pgtable
>> +         * being used in mm.
>> +         */
>> +        mm_inc_nr_ptes(mm);
>> +    }
>>       pmd_populate(mm, &_pmd, pgtable);
>>         pte = pte_offset_map(&_pmd, haddr);
>> @@ -2982,12 +3016,11 @@ static void __split_huge_zero_page_pmd(struct vm_area_struct *vma,
>>   }
>>     static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd,
>> -        unsigned long haddr, bool freeze)
>> +        unsigned long haddr, bool freeze, pgtable_t pgtable)
>>   {
>>       struct mm_struct *mm = vma->vm_mm;
>>       struct folio *folio;
>>       struct page *page;
>> -    pgtable_t pgtable;
>>       pmd_t old_pmd, _pmd;
>>       bool soft_dirty, uffd_wp = false, young = false, write = false;
>>       bool anon_exclusive = false, dirty = false;
>> @@ -3011,6 +3044,8 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd,
>>            */
>>           if (arch_needs_pgtable_deposit())
>>               zap_deposited_table(mm, pmd);
>> +        if (pgtable)
>> +            pte_free(mm, pgtable);
>>           if (!vma_is_dax(vma) && vma_is_special_huge(vma))
>>               return;
>>           if (unlikely(pmd_is_migration_entry(old_pmd))) {
>> @@ -3043,7 +3078,7 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd,
>>            * small page also write protected so it does not seems useful
>>            * to invalidate secondary mmu at this time.
>>            */
>> -        return __split_huge_zero_page_pmd(vma, haddr, pmd);
>> +        return __split_huge_zero_page_pmd(vma, haddr, pmd, pgtable);
>>       }
>>         if (pmd_is_migration_entry(*pmd)) {
>> @@ -3167,7 +3202,16 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd,
>>        * Withdraw the table only after we mark the pmd entry invalid.
>>        * This's critical for some architectures (Power).
>>        */
>> -    pgtable = pgtable_trans_huge_withdraw(mm, pmd);
>> +    if (arch_needs_pgtable_deposit()) {
>> +        pgtable = pgtable_trans_huge_withdraw(mm, pmd);
>> +    } else {
>> +        VM_BUG_ON(!pgtable);
>> +        /*
>> +         * Account for the freshly allocated (in __split_huge_pmd) pgtable
>> +         * being used in mm.
>> +         */
>> +        mm_inc_nr_ptes(mm);
>> +    }
>>       pmd_populate(mm, &_pmd, pgtable);
>>         pte = pte_offset_map(&_pmd, haddr);
>> @@ -3263,11 +3307,13 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd,
>>   }
>>     void split_huge_pmd_locked(struct vm_area_struct *vma, unsigned long address,
>> -               pmd_t *pmd, bool freeze)
>> +               pmd_t *pmd, bool freeze, pgtable_t pgtable)
>>   {
>>       VM_WARN_ON_ONCE(!IS_ALIGNED(address, HPAGE_PMD_SIZE));
>>       if (pmd_trans_huge(*pmd) || pmd_is_valid_softleaf(*pmd))
>> -        __split_huge_pmd_locked(vma, pmd, address, freeze);
>> +        __split_huge_pmd_locked(vma, pmd, address, freeze, pgtable);
>> +    else if (pgtable)
>> +        pte_free(vma->vm_mm, pgtable);
>>   }
>>     void __split_huge_pmd(struct vm_area_struct *vma, pmd_t *pmd,
>> @@ -3275,13 +3321,24 @@ void __split_huge_pmd(struct vm_area_struct *vma, pmd_t *pmd,
>>   {
>>       spinlock_t *ptl;
>>       struct mmu_notifier_range range;
>> +    pgtable_t pgtable = NULL;
>>         mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, vma->vm_mm,
>>                   address & HPAGE_PMD_MASK,
>>                   (address & HPAGE_PMD_MASK) + HPAGE_PMD_SIZE);
>>       mmu_notifier_invalidate_range_start(&range);
>> +
>> +    /* allocate pagetable before acquiring pmd lock */
>> +    if (vma_is_anonymous(vma) && !arch_needs_pgtable_deposit()) {
>> +        pgtable = pte_alloc_one(vma->vm_mm);
>> +        if (!pgtable) {
>> +            mmu_notifier_invalidate_range_end(&range);
> 
> What I last looked at this, I thought the clean thing to do is to let __split_huge_pmd() and friends return an error.
> 
> Let's take a look at walk_pmd_range() as one example:
> 
> if (walk->vma)
>     split_huge_pmd(walk->vma, pmd, addr);
> else if (pmd_leaf(*pmd) || !pmd_present(*pmd))
>     continue;
> 
> err = walk_pte_range(pmd, addr, next, walk);
> 
> Where walk_pte_range() just does a pte_offset_map_lock.
> 
>     pte = pte_offset_map_lock(walk->mm, pmd, addr, &ptl);
> 
> But if that fails (as the remapping failed), we will silently skip this range.
> 
> I don't think silently skipping is the right thing to do.
> 
> So I would think that all splitting functions have to be taught to return an error and handle it accordingly. Then we can actually start returning errors.
> 

Ack. This was one of the cases where we would try again if needed.
I did manual code analysis which I included at the end of the commit message
but agreed, its best to return an error and handle accordingly.
I will look into doing this for the next revision.

next prev parent reply	other threads:[~2026-02-11 13:47 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-02-11 12:49 [RFC 0/2] mm: thp: split time allocation of page table for THPs Usama Arif
2026-02-11 12:49 ` [RFC 1/2] mm: thp: allocate PTE page tables lazily at split time Usama Arif
2026-02-11 13:25   ` David Hildenbrand (Arm)
2026-02-11 13:38     ` Usama Arif
2026-02-12 12:13     ` Ritesh Harjani
2026-02-12 15:25       ` Usama Arif
2026-02-12 15:39       ` David Hildenbrand (Arm)
2026-02-12 16:46         ` Ritesh Harjani
2026-02-11 13:35   ` David Hildenbrand (Arm)
2026-02-11 13:46     ` Kiryl Shutsemau
2026-02-11 13:47     ` Usama Arif [this message]
2026-02-11 19:28   ` Matthew Wilcox
2026-02-11 19:55     ` David Hildenbrand (Arm)
2026-02-11 12:49 ` [RFC 2/2] mm: thp: add THP_SPLIT_PMD_PTE_ALLOC_FAILED counter Usama Arif
2026-02-11 13:27   ` David Hildenbrand (Arm)
2026-02-11 13:31     ` Usama Arif
2026-02-11 13:36       ` David Hildenbrand (Arm)
2026-02-11 13:42         ` Usama Arif
2026-02-11 13:38       ` David Hildenbrand (Arm)
2026-02-11 13:43         ` Usama Arif

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=82d2de0a-c321-4254-aca0-7d63abd4f1af@linux.dev \
    --to=usama.arif@linux.dev \
    --cc=Liam.Howlett@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=baohua@kernel.org \
    --cc=baolin.wang@linux.alibaba.com \
    --cc=david@kernel.org \
    --cc=dev.jain@arm.com \
    --cc=fvdl@google.com \
    --cc=hannes@cmpxchg.org \
    --cc=kas@kernel.org \
    --cc=kernel-team@meta.com \
    --cc=lance.yang@linux.dev \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lorenzo.stoakes@oracle.com \
    --cc=npache@redhat.com \
    --cc=riel@surriel.com \
    --cc=ryan.roberts@arm.com \
    --cc=shakeel.butt@linux.dev \
    --cc=vbabka@suse.cz \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox