linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Usama Arif <usama.arif@linux.dev>
To: Matthew Wilcox <willy@infradead.org>
Cc: Hugh Dickins <hughd@google.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	david@kernel.org, Lorenzo Stoakes <ljs@kernel.org>,
	linux-mm@kvack.org, fvdl@google.com, hannes@cmpxchg.org,
	riel@surriel.com, shakeel.butt@linux.dev, kas@kernel.org,
	baohua@kernel.org, dev.jain@arm.com,
	baolin.wang@linux.alibaba.com, npache@redhat.com,
	Liam.Howlett@oracle.com, ryan.roberts@arm.com,
	Vlastimil Babka <vbabka@kernel.org>,
	lance.yang@linux.dev, linux-kernel@vger.kernel.org,
	kernel-team@meta.com, maddy@linux.ibm.com, mpe@ellerman.id.au,
	linuxppc-dev@lists.ozlabs.org, hca@linux.ibm.com,
	gor@linux.ibm.com, agordeev@linux.ibm.com,
	borntraeger@linux.ibm.com, svens@linux.ibm.com,
	linux-s390@vger.kernel.org, Nhat Pham <nphamcs@gmail.com>
Subject: Re: [v3 00/24] mm: thp: lazy PTE page table allocation at PMD split time
Date: Thu, 9 Apr 2026 13:48:14 +0100	[thread overview]
Message-ID: <9fb076b5-ed33-458d-b39b-a2de3433a0da@linux.dev> (raw)
In-Reply-To: <adaxWs8BjCJB1aan@casper.infradead.org>



On 08/04/2026 20:49, Matthew Wilcox wrote:
> On Wed, Apr 08, 2026 at 04:06:29PM +0100, Usama Arif wrote:
>> On 06/04/2026 00:34, Hugh Dickins wrote:
>>> What would help a lot would be the implementation of swap entries
>>> at the PMD level.  Whether that would help enough, I'm sceptical:
>>> I do think it's foolish to depend upon the availability of huge
>>> contiguous swap extents, whatever the recent improvements there;
>>> but it would at least be an arguable justification.
>>>
>> Thanks for pointing this out. I should have thought of this as I
>> have been thinking about fork a lot for 1G THP and for this series.
>>
>> I am working on trying to make PMD level swap entires work. I hope
>> to have a RFC soon.
> 
> I think you may have missed Hugh's point a little bit.  If we do
> support PMD-level swap entries, that means we have to be able to find
> contiguous space in the swap space for 512 entries.  I don't know how
> hard that will be, but I can imagine it's not that easy.

Ah so my understanding is that with CONFIG_THP_SWAP enabled, the swap
allocator already tries to allocate 512 contiguous swap slots for a THP.
With CONFIG_THP_SWAP, each swap cluster is exactly SWAPFILE_CLUSTER (512)
entries in size, meaning 2M will fit perfectly. Clusters track their
allocation order (ci->order), and the swap allocator maintains per-order
free lists (nonfull_clusters[order]), so THP-order allocations are
directed to clusters already dedicated to that order rather than
competing with base-page allocations.
The per-CPU caching (percpu_swap_cluster.si[order] / offset[order])
should further ensure that consecutive THP swap-outs from the same CPU
reuse the same cluster efficiently.

With PMD swap entry we will change how the page table records it
(1 PMD entry vs 512 PTE entries). Hence we wont need to allocate
page tables and would help to address Hugh's valid concern
of have to allocate pagetables if there is no pagetable depost.




      reply	other threads:[~2026-04-09 12:48 UTC|newest]

Thread overview: 37+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-27  2:08 Usama Arif
2026-03-27  2:08 ` [v3 01/24] mm: thp: make split_huge_pmd functions return int for error propagation Usama Arif
2026-03-27  2:08 ` [v3 02/24] mm: thp: propagate split failure from vma_adjust_trans_huge() Usama Arif
2026-03-27  2:08 ` [v3 03/24] mm: thp: handle split failure in copy_huge_pmd() Usama Arif
2026-03-27  2:08 ` [v3 04/24] mm: thp: handle split failure in do_huge_pmd_wp_page() Usama Arif
2026-03-27  2:08 ` [v3 05/24] mm: thp: handle split failure in zap_pmd_range() Usama Arif
2026-03-30 14:13   ` Kiryl Shutsemau
2026-03-30 15:09     ` David Hildenbrand (Arm)
2026-03-27  2:08 ` [v3 06/24] mm: thp: handle split failure in wp_huge_pmd() Usama Arif
2026-03-27  2:08 ` [v3 07/24] mm: thp: retry on split failure in change_pmd_range() Usama Arif
2026-03-30 14:27   ` Kiryl Shutsemau
2026-03-27  2:08 ` [v3 08/24] mm: thp: handle split failure in follow_pmd_mask() Usama Arif
2026-03-27  2:08 ` [v3 09/24] mm: handle walk_page_range() failure from THP split Usama Arif
2026-03-27  2:08 ` [v3 10/24] mm: thp: handle split failure in mremap move_page_tables() Usama Arif
2026-03-27  2:08 ` [v3 11/24] mm: thp: handle split failure in userfaultfd move_pages() Usama Arif
2026-03-27  2:08 ` [v3 12/24] mm: thp: handle split failure in device migration Usama Arif
2026-03-27  2:08 ` [v3 13/24] mm: proc: handle split_huge_pmd failure in pagemap_scan Usama Arif
2026-03-27  2:08 ` [v3 14/24] powerpc/mm: handle split_huge_pmd failure in subpage_prot Usama Arif
2026-03-27  2:08 ` [v3 15/24] fs/dax: handle split_huge_pmd failure in dax_iomap_pmd_fault Usama Arif
2026-03-27  2:08 ` [v3 16/24] mm: huge_mm: Make sure all split_huge_pmd calls are checked Usama Arif
2026-03-30 14:41   ` Kiryl Shutsemau
2026-03-27  2:08 ` [v3 17/24] mm: thp: allocate PTE page tables lazily at split time Usama Arif
2026-03-27  2:09 ` [v3 18/24] mm: thp: remove pgtable_trans_huge_{deposit/withdraw} when not needed Usama Arif
2026-03-27  2:09 ` [v3 19/24] mm: thp: add THP_SPLIT_PMD_FAILED counter Usama Arif
2026-03-27  2:09 ` [v3 20/24] selftests/mm: add THP PMD split test infrastructure Usama Arif
2026-03-27  2:09 ` [v3 21/24] selftests/mm: add partial_mprotect test for change_pmd_range Usama Arif
2026-03-27  2:09 ` [v3 22/24] selftests/mm: add partial_mlock test Usama Arif
2026-03-27  2:09 ` [v3 23/24] selftests/mm: add partial_mremap test for move_page_tables Usama Arif
2026-03-27  2:09 ` [v3 24/24] selftests/mm: add madv_dontneed_partial test Usama Arif
2026-03-27  8:51 ` [v3 00/24] mm: thp: lazy PTE page table allocation at PMD split time David Hildenbrand (Arm)
2026-03-27  9:25   ` Lorenzo Stoakes (Oracle)
2026-03-27 14:40     ` Usama Arif
2026-03-27 14:34   ` Usama Arif
2026-04-05 23:34 ` Hugh Dickins
2026-04-08 15:06   ` Usama Arif
2026-04-08 19:49     ` Matthew Wilcox
2026-04-09 12:48       ` Usama Arif [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=9fb076b5-ed33-458d-b39b-a2de3433a0da@linux.dev \
    --to=usama.arif@linux.dev \
    --cc=Liam.Howlett@oracle.com \
    --cc=agordeev@linux.ibm.com \
    --cc=akpm@linux-foundation.org \
    --cc=baohua@kernel.org \
    --cc=baolin.wang@linux.alibaba.com \
    --cc=borntraeger@linux.ibm.com \
    --cc=david@kernel.org \
    --cc=dev.jain@arm.com \
    --cc=fvdl@google.com \
    --cc=gor@linux.ibm.com \
    --cc=hannes@cmpxchg.org \
    --cc=hca@linux.ibm.com \
    --cc=hughd@google.com \
    --cc=kas@kernel.org \
    --cc=kernel-team@meta.com \
    --cc=lance.yang@linux.dev \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-s390@vger.kernel.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=ljs@kernel.org \
    --cc=maddy@linux.ibm.com \
    --cc=mpe@ellerman.id.au \
    --cc=npache@redhat.com \
    --cc=nphamcs@gmail.com \
    --cc=riel@surriel.com \
    --cc=ryan.roberts@arm.com \
    --cc=shakeel.butt@linux.dev \
    --cc=svens@linux.ibm.com \
    --cc=vbabka@kernel.org \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox