linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Zi Yan <ziy@nvidia.com>
To: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Christian Borntraeger <borntraeger@linux.ibm.com>,
	Janosch Frank <frankja@linux.ibm.com>,
	Claudio Imbrenda <imbrenda@linux.ibm.com>,
	David Hildenbrand <david@redhat.com>,
	Alexander Gordeev <agordeev@linux.ibm.com>,
	Gerald Schaefer <gerald.schaefer@linux.ibm.com>,
	Heiko Carstens <hca@linux.ibm.com>,
	Vasily Gorbik <gor@linux.ibm.com>,
	Sven Schnelle <svens@linux.ibm.com>, Peter Xu <peterx@redhat.com>,
	Alexander Viro <viro@zeniv.linux.org.uk>,
	Christian Brauner <brauner@kernel.org>, Jan Kara <jack@suse.cz>,
	Arnd Bergmann <arnd@arndb.de>,
	Baolin Wang <baolin.wang@linux.alibaba.com>,
	"Liam R . Howlett" <Liam.Howlett@oracle.com>,
	Nico Pache <npache@redhat.com>,
	Ryan Roberts <ryan.roberts@arm.com>, Dev Jain <dev.jain@arm.com>,
	Barry Song <baohua@kernel.org>, Lance Yang <lance.yang@linux.dev>,
	Muchun Song <muchun.song@linux.dev>,
	Oscar Salvador <osalvador@suse.de>,
	Vlastimil Babka <vbabka@suse.cz>, Mike Rapoport <rppt@kernel.org>,
	Suren Baghdasaryan <surenb@google.com>,
	Michal Hocko <mhocko@suse.com>,
	Matthew Brost <matthew.brost@intel.com>,
	Joshua Hahn <joshua.hahnjy@gmail.com>,
	Rakie Kim <rakie.kim@sk.com>, Byungchul Park <byungchul@sk.com>,
	Gregory Price <gourry@gourry.net>,
	Ying Huang <ying.huang@linux.alibaba.com>,
	Alistair Popple <apopple@nvidia.com>,
	Axel Rasmussen <axelrasmussen@google.com>,
	Yuanchu Xie <yuanchu@google.com>, Wei Xu <weixugc@google.com>,
	Kemeng Shi <shikemeng@huaweicloud.com>,
	Kairui Song <kasong@tencent.com>, Nhat Pham <nphamcs@gmail.com>,
	Baoquan He <bhe@redhat.com>, Chris Li <chrisl@kernel.org>,
	SeongJae Park <sj@kernel.org>,
	Matthew Wilcox <willy@infradead.org>,
	Jason Gunthorpe <jgg@ziepe.ca>, Leon Romanovsky <leon@kernel.org>,
	Xu Xin <xu.xin16@zte.com.cn>,
	Chengming Zhou <chengming.zhou@linux.dev>,
	Jann Horn <jannh@google.com>, Miaohe Lin <linmiaohe@huawei.com>,
	Naoya Horiguchi <nao.horiguchi@gmail.com>,
	Pedro Falcato <pfalcato@suse.de>,
	Pasha Tatashin <pasha.tatashin@soleen.com>,
	Rik van Riel <riel@surriel.com>, Harry Yoo <harry.yoo@oracle.com>,
	Hugh Dickins <hughd@google.com>,
	linux-kernel@vger.kernel.org, kvm@vger.kernel.org,
	linux-s390@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	linux-mm@kvack.org, linux-arch@vger.kernel.org,
	damon@lists.linux.dev
Subject: Re: [PATCH v3 02/16] mm: introduce leaf entry type and use to simplify leaf entry logic
Date: Mon, 10 Nov 2025 22:56:33 -0500	[thread overview]
Message-ID: <3E8190A4-5B17-4A36-9025-F7E4FF1127AB@nvidia.com> (raw)
In-Reply-To: <c879383aac77d96a03e4d38f7daba893cd35fc76.1762812360.git.lorenzo.stoakes@oracle.com>

On 10 Nov 2025, at 17:21, Lorenzo Stoakes wrote:

> The kernel maintains leaf page table entries which contain either:
>
> - Nothing ('none' entries)
> - Present entries (that is stuff the hardware can navigate without fault)
> - Everything else that will cause a fault which the kernel handles
>
> In the 'everything else' group we include swap entries, but we also include
> a number of other things such as migration entries, device private entries
> and marker entries.
>
> Unfortunately this 'everything else' group expresses everything through
> a swp_entry_t type, and these entries are referred to swap entries even
> though they may well not contain a... swap entry.
>
> This is compounded by the rather mind-boggling concept of a non-swap swap
> entry (checked via non_swap_entry()) and the means by which we twist and
> turn to satisfy this.
>
> This patch lays the foundation for reducing this confusion.
>
> We refer to 'everything else' as a 'software-define leaf entry' or
> 'softleaf'. for short And in fact we scoop up the 'none' entries into this
> concept also so we are left with:
>
> - Present entries.
> - Softleaf entries (which may be empty).
>
> This allows for radical simplification across the board - one can simply
> convert any leaf page table entry to a leaf entry via softleaf_from_pte().
>
> If the entry is present, we return an empty leaf entry, so it is assumed
> the caller is aware that they must differentiate between the two categories
> of page table entries, checking for the former via pte_present().
>
> As a result, we can eliminate a number of places where we would otherwise
> need to use predicates to see if we can proceed with leaf page table entry
> conversion and instead just go ahead and do it unconditionally.
>
> We do so where we can, adjusting surrounding logic as necessary to
> integrate the new softleaf_t logic as far as seems reasonable at this
> stage.
>
> We typedef swp_entry_t to softleaf_t for the time being until the
> conversion can be complete, meaning everything remains compatible
> regardless of which type is used. We will eventually remove swp_entry_t
> when the conversion is complete.
>
> We introduce a new header file to keep things clear - leafops.h - this
> imports swapops.h so can direct replace swapops imports without issue, and
> we do so in all the files that require it.
>
> Additionally, add new leafops.h file to core mm maintainers entry.
>
> Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
> ---
>  MAINTAINERS                   |   1 +
>  fs/proc/task_mmu.c            |  26 +--
>  fs/userfaultfd.c              |   6 +-
>  include/linux/leafops.h       | 387 ++++++++++++++++++++++++++++++++++
>  include/linux/mm_inline.h     |   6 +-
>  include/linux/mm_types.h      |  25 +++
>  include/linux/swapops.h       |  28 ---
>  include/linux/userfaultfd_k.h |  51 +----
>  mm/hmm.c                      |   2 +-
>  mm/hugetlb.c                  |  37 ++--
>  mm/madvise.c                  |  16 +-
>  mm/memory.c                   |  41 ++--
>  mm/mincore.c                  |   6 +-
>  mm/mprotect.c                 |   6 +-
>  mm/mremap.c                   |   4 +-
>  mm/page_vma_mapped.c          |  11 +-
>  mm/shmem.c                    |   7 +-
>  mm/userfaultfd.c              |   6 +-
>  18 files changed, 502 insertions(+), 164 deletions(-)
>  create mode 100644 include/linux/leafops.h
>
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 2628431dcdfe..314910a70bbf 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -16257,6 +16257,7 @@ T:	git git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
>  F:	include/linux/gfp.h
>  F:	include/linux/gfp_types.h
>  F:	include/linux/highmem.h
> +F:	include/linux/leafops.h
>  F:	include/linux/memory.h
>  F:	include/linux/mm.h
>  F:	include/linux/mm_*.h
> diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
> index fc35a0543f01..24d26b49d870 100644
> --- a/fs/proc/task_mmu.c
> +++ b/fs/proc/task_mmu.c
> @@ -14,7 +14,7 @@
>  #include <linux/rmap.h>
>  #include <linux/swap.h>
>  #include <linux/sched/mm.h>
> -#include <linux/swapops.h>
> +#include <linux/leafops.h>
>  #include <linux/mmu_notifier.h>
>  #include <linux/page_idle.h>
>  #include <linux/shmem_fs.h>
> @@ -1230,11 +1230,11 @@ static int smaps_hugetlb_range(pte_t *pte, unsigned long hmask,
>  	if (pte_present(ptent)) {
>  		folio = page_folio(pte_page(ptent));
>  		present = true;
> -	} else if (is_swap_pte(ptent)) {
> -		swp_entry_t swpent = pte_to_swp_entry(ptent);
> +	} else {
> +		const softleaf_t entry = softleaf_from_pte(ptent);
>
> -		if (is_pfn_swap_entry(swpent))
> -			folio = pfn_swap_entry_folio(swpent);
> +		if (softleaf_has_pfn(entry))
> +			folio = softleaf_to_folio(entry);
>  	}
>
>  	if (folio) {

<snip>

>
> @@ -2330,18 +2330,18 @@ static unsigned long pagemap_page_category(struct pagemap_scan_private *p,
>  		if (pte_soft_dirty(pte))
>  			categories |= PAGE_IS_SOFT_DIRTY;
>  	} else if (is_swap_pte(pte)) {

This should be just “else” like smaps_hugetlb_range()’s change, right?

> -		swp_entry_t swp;
> +		softleaf_t entry;
>
>  		categories |= PAGE_IS_SWAPPED;
>  		if (!pte_swp_uffd_wp_any(pte))
>  			categories |= PAGE_IS_WRITTEN;
>
> -		swp = pte_to_swp_entry(pte);
> -		if (is_guard_swp_entry(swp))
> +		entry = softleaf_from_pte(pte);
> +		if (softleaf_is_guard_marker(entry))
>  			categories |= PAGE_IS_GUARD;
>  		else if ((p->masks_of_interest & PAGE_IS_FILE) &&
> -			 is_pfn_swap_entry(swp) &&
> -			 !folio_test_anon(pfn_swap_entry_folio(swp)))
> +			 softleaf_has_pfn(entry) &&
> +			 !folio_test_anon(softleaf_to_folio(entry)))
>  			categories |= PAGE_IS_FILE;
>
>  		if (pte_swp_soft_dirty(pte))

<snip>

> diff --git a/mm/page_vma_mapped.c b/mm/page_vma_mapped.c
> index 137ce27ff68c..be20468fb5a9 100644
> --- a/mm/page_vma_mapped.c
> +++ b/mm/page_vma_mapped.c
> @@ -3,7 +3,7 @@
>  #include <linux/rmap.h>
>  #include <linux/hugetlb.h>
>  #include <linux/swap.h>
> -#include <linux/swapops.h>
> +#include <linux/leafops.h>
>
>  #include "internal.h"
>
> @@ -107,15 +107,12 @@ static bool check_pte(struct page_vma_mapped_walk *pvmw, unsigned long pte_nr)
>  	pte_t ptent = ptep_get(pvmw->pte);
>
>  	if (pvmw->flags & PVMW_MIGRATION) {
> -		swp_entry_t entry;
> -		if (!is_swap_pte(ptent))
> -			return false;
> -		entry = pte_to_swp_entry(ptent);
> +		const softleaf_t entry = softleaf_from_pte(ptent);

We do not need is_swap_pte() check here because softleaf_from_pte()
does the check. Just trying to reason the code with myself here.

>
> -		if (!is_migration_entry(entry))
> +		if (!softleaf_is_migration(entry))
>  			return false;
>
> -		pfn = swp_offset_pfn(entry);
> +		pfn = softleaf_to_pfn(entry);
>  	} else if (is_swap_pte(ptent)) {
>  		swp_entry_t entry;
>
> diff --git a/mm/shmem.c b/mm/shmem.c
> index 6580f3cd24bb..395ca58ac4a5 100644
> --- a/mm/shmem.c
> +++ b/mm/shmem.c
> @@ -66,7 +66,7 @@ static struct vfsmount *shm_mnt __ro_after_init;
>  #include <linux/falloc.h>
>  #include <linux/splice.h>
>  #include <linux/security.h>
> -#include <linux/swapops.h>
> +#include <linux/leafops.h>
>  #include <linux/mempolicy.h>
>  #include <linux/namei.h>
>  #include <linux/ctype.h>
> @@ -2286,7 +2286,8 @@ static int shmem_swapin_folio(struct inode *inode, pgoff_t index,
>  	struct address_space *mapping = inode->i_mapping;
>  	struct mm_struct *fault_mm = vma ? vma->vm_mm : NULL;
>  	struct shmem_inode_info *info = SHMEM_I(inode);
> -	swp_entry_t swap, index_entry;
> +	swp_entry_t swap;
> +	softleaf_t index_entry;
>  	struct swap_info_struct *si;
>  	struct folio *folio = NULL;
>  	bool skip_swapcache = false;
> @@ -2298,7 +2299,7 @@ static int shmem_swapin_folio(struct inode *inode, pgoff_t index,
>  	swap = index_entry;
>  	*foliop = NULL;
>
> -	if (is_poisoned_swp_entry(index_entry))
> +	if (softleaf_is_poison_marker(index_entry))
>  		return -EIO;
>
>  	si = get_swap_device(index_entry);
> diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c
> index cc4ce205bbec..055ec1050776 100644
> --- a/mm/userfaultfd.c
> +++ b/mm/userfaultfd.c
> @@ -10,7 +10,7 @@
>  #include <linux/pagemap.h>
>  #include <linux/rmap.h>
>  #include <linux/swap.h>
> -#include <linux/swapops.h>
> +#include <linux/leafops.h>
>  #include <linux/userfaultfd_k.h>
>  #include <linux/mmu_notifier.h>
>  #include <linux/hugetlb.h>
> @@ -208,7 +208,7 @@ int mfill_atomic_install_pte(pmd_t *dst_pmd,
>  	 * MISSING|WP registered, we firstly wr-protect a none pte which has no
>  	 * page cache page backing it, then access the page.
>  	 */
> -	if (!pte_none(dst_ptep) && !is_uffd_pte_marker(dst_ptep))
> +	if (!pte_none(dst_ptep) && !pte_is_uffd_marker(dst_ptep))
>  		goto out_unlock;
>
>  	if (page_in_cache) {
> @@ -590,7 +590,7 @@ static __always_inline ssize_t mfill_atomic_hugetlb(
>  		if (!uffd_flags_mode_is(flags, MFILL_ATOMIC_CONTINUE)) {
>  			const pte_t ptep = huge_ptep_get(dst_mm, dst_addr, dst_pte);
>
> -			if (!huge_pte_none(ptep) && !is_uffd_pte_marker(ptep)) {
> +			if (!huge_pte_none(ptep) && !pte_is_uffd_marker(ptep)) {
>  				err = -EEXIST;
>  				hugetlb_vma_unlock_read(dst_vma);
>  				mutex_unlock(&hugetlb_fault_mutex_table[hash]);

The rest of the code looks good to me. I will check it again once
you fix the commit log and comments. Thank you for working on this.

Best Regards,
Yan, Zi


  parent reply	other threads:[~2025-11-11  3:56 UTC|newest]

Thread overview: 66+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-11-10 22:21 [PATCH v2 00/16] mm: remove is_swap_[pte, pmd]() + non-swap entries, introduce leaf entries Lorenzo Stoakes
2025-11-10 22:21 ` [PATCH v3 01/16] mm: correctly handle UFFD PTE markers Lorenzo Stoakes
2025-11-11  9:39   ` Mike Rapoport
2025-11-11  9:48     ` Lorenzo Stoakes
2025-11-10 22:21 ` [PATCH v3 02/16] mm: introduce leaf entry type and use to simplify leaf entry logic Lorenzo Stoakes
2025-11-11  3:25   ` Zi Yan
2025-11-11  7:16     ` Lorenzo Stoakes
2025-11-11 16:20       ` Zi Yan
2025-11-11 13:06     ` David Hildenbrand (Red Hat)
2025-11-11 16:26       ` Zi Yan
2025-11-12 15:36         ` Lorenzo Stoakes
2025-11-11  3:56   ` Zi Yan [this message]
2025-11-11  7:31     ` Lorenzo Stoakes
2025-11-11 16:40       ` Zi Yan
2025-11-12 14:06         ` Lorenzo Stoakes
2025-11-12 15:32   ` Lorenzo Stoakes
2025-11-12 15:36   ` Vlastimil Babka
2025-11-13 14:56   ` Lorenzo Stoakes
2025-11-13 15:32     ` Lorenzo Stoakes
2025-11-10 22:21 ` [PATCH v3 03/16] mm: avoid unnecessary uses of is_swap_pte() Lorenzo Stoakes
2025-11-12  2:58   ` Zi Yan
2025-11-12 15:59     ` Lorenzo Stoakes
2025-11-12 16:03       ` Zi Yan
2025-11-12 16:11     ` Zi Yan
2025-11-12 18:48   ` Vlastimil Babka
2025-11-10 22:21 ` [PATCH v3 04/16] mm: eliminate is_swap_pte() when softleaf_from_pte() suffices Lorenzo Stoakes
2025-11-21 16:46   ` Vlastimil Babka
2025-11-10 22:21 ` [PATCH v3 05/16] mm: use leaf entries in debug pgtable + remove is_swap_pte() Lorenzo Stoakes
2025-11-21 17:10   ` Vlastimil Babka
2025-11-10 22:21 ` [PATCH v3 06/16] fs/proc/task_mmu: refactor pagemap_pmd_range() Lorenzo Stoakes
2025-11-21 17:17   ` Vlastimil Babka
2025-11-10 22:21 ` [PATCH v3 07/16] mm: avoid unnecessary use of is_swap_pmd() Lorenzo Stoakes
2025-11-21 17:42   ` Vlastimil Babka
2025-11-21 19:25     ` Lorenzo Stoakes
2025-11-21 19:55       ` Andrew Morton
2025-11-24 12:27         ` Lorenzo Stoakes
2025-11-10 22:21 ` [PATCH v3 08/16] mm/huge_memory: refactor copy_huge_pmd() non-present logic Lorenzo Stoakes
2025-11-21 17:56   ` Vlastimil Babka
2025-11-21 19:23     ` Lorenzo Stoakes
2025-11-10 22:21 ` [PATCH v3 09/16] mm/huge_memory: refactor change_huge_pmd() " Lorenzo Stoakes
2025-11-21 17:58   ` Vlastimil Babka
2025-11-10 22:21 ` [PATCH v3 10/16] mm: replace pmd_to_swp_entry() with softleaf_from_pmd() Lorenzo Stoakes
2025-11-21 18:42   ` Vlastimil Babka
2025-11-21 19:22     ` Lorenzo Stoakes
2025-11-21 19:23   ` Lorenzo Stoakes
2025-11-10 22:21 ` [PATCH v3 11/16] mm: introduce pmd_is_huge() and use where appropriate Lorenzo Stoakes
2025-11-27 17:00   ` Vlastimil Babka
2025-11-10 22:21 ` [PATCH v3 12/16] mm: remove remaining is_swap_pmd() users and is_swap_pmd() Lorenzo Stoakes
2025-11-27 17:03   ` Vlastimil Babka
2025-11-10 22:21 ` [PATCH v3 13/16] mm: remove non_swap_entry() and use softleaf helpers instead Lorenzo Stoakes
2025-11-27 17:12   ` Vlastimil Babka
2025-11-10 22:21 ` [PATCH v3 14/16] mm: remove is_hugetlb_entry_[migration, hwpoisoned]() Lorenzo Stoakes
2025-11-27 17:29   ` Vlastimil Babka
2025-11-27 17:41     ` Lorenzo Stoakes
2025-11-27 17:45   ` Lorenzo Stoakes
2025-11-27 19:33     ` Andrew Morton
2025-11-10 22:21 ` [PATCH v3 15/16] mm: eliminate further swapops predicates Lorenzo Stoakes
2025-11-27 17:42   ` Vlastimil Babka
2025-11-10 22:21 ` [PATCH v3 16/16] mm: replace remaining pte_to_swp_entry() with softleaf_from_pte() Lorenzo Stoakes
2025-11-27 17:53   ` Vlastimil Babka
2025-11-27 18:02     ` Vlastimil Babka
2025-11-27 18:03     ` Lorenzo Stoakes
2025-11-10 22:24 ` [PATCH v2 00/16] mm: remove is_swap_[pte, pmd]() + non-swap entries, introduce leaf entries Lorenzo Stoakes
2025-11-11  0:17 ` Andrew Morton
2025-11-21 23:44 ` Jason Gunthorpe
2025-11-24 10:06   ` Lorenzo Stoakes

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3E8190A4-5B17-4A36-9025-F7E4FF1127AB@nvidia.com \
    --to=ziy@nvidia.com \
    --cc=Liam.Howlett@oracle.com \
    --cc=agordeev@linux.ibm.com \
    --cc=akpm@linux-foundation.org \
    --cc=apopple@nvidia.com \
    --cc=arnd@arndb.de \
    --cc=axelrasmussen@google.com \
    --cc=baohua@kernel.org \
    --cc=baolin.wang@linux.alibaba.com \
    --cc=bhe@redhat.com \
    --cc=borntraeger@linux.ibm.com \
    --cc=brauner@kernel.org \
    --cc=byungchul@sk.com \
    --cc=chengming.zhou@linux.dev \
    --cc=chrisl@kernel.org \
    --cc=damon@lists.linux.dev \
    --cc=david@redhat.com \
    --cc=dev.jain@arm.com \
    --cc=frankja@linux.ibm.com \
    --cc=gerald.schaefer@linux.ibm.com \
    --cc=gor@linux.ibm.com \
    --cc=gourry@gourry.net \
    --cc=harry.yoo@oracle.com \
    --cc=hca@linux.ibm.com \
    --cc=hughd@google.com \
    --cc=imbrenda@linux.ibm.com \
    --cc=jack@suse.cz \
    --cc=jannh@google.com \
    --cc=jgg@ziepe.ca \
    --cc=joshua.hahnjy@gmail.com \
    --cc=kasong@tencent.com \
    --cc=kvm@vger.kernel.org \
    --cc=lance.yang@linux.dev \
    --cc=leon@kernel.org \
    --cc=linmiaohe@huawei.com \
    --cc=linux-arch@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-s390@vger.kernel.org \
    --cc=lorenzo.stoakes@oracle.com \
    --cc=matthew.brost@intel.com \
    --cc=mhocko@suse.com \
    --cc=muchun.song@linux.dev \
    --cc=nao.horiguchi@gmail.com \
    --cc=npache@redhat.com \
    --cc=nphamcs@gmail.com \
    --cc=osalvador@suse.de \
    --cc=pasha.tatashin@soleen.com \
    --cc=peterx@redhat.com \
    --cc=pfalcato@suse.de \
    --cc=rakie.kim@sk.com \
    --cc=riel@surriel.com \
    --cc=rppt@kernel.org \
    --cc=ryan.roberts@arm.com \
    --cc=shikemeng@huaweicloud.com \
    --cc=sj@kernel.org \
    --cc=surenb@google.com \
    --cc=svens@linux.ibm.com \
    --cc=vbabka@suse.cz \
    --cc=viro@zeniv.linux.org.uk \
    --cc=weixugc@google.com \
    --cc=willy@infradead.org \
    --cc=xu.xin16@zte.com.cn \
    --cc=ying.huang@linux.alibaba.com \
    --cc=yuanchu@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox