linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH 00/12] remove is_swap_[pte, pmd]() + non-swap confusion
@ 2025-10-24  7:41 Lorenzo Stoakes
  2025-10-24  7:41 ` [RFC PATCH 01/12] mm: introduce and use pte_to_swp_entry_or_zero() Lorenzo Stoakes
                   ` (13 more replies)
  0 siblings, 14 replies; 47+ messages in thread
From: Lorenzo Stoakes @ 2025-10-24  7:41 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Christian Borntraeger, Janosch Frank, Claudio Imbrenda,
	David Hildenbrand, Alexander Gordeev, Gerald Schaefer,
	Heiko Carstens, Vasily Gorbik, Sven Schnelle, Zi Yan,
	Baolin Wang, Liam R . Howlett, Nico Pache, Ryan Roberts,
	Dev Jain, Barry Song, Lance Yang, Kemeng Shi, Kairui Song,
	Nhat Pham, Baoquan He, Chris Li, Peter Xu, Matthew Wilcox,
	Jason Gunthorpe, Leon Romanovsky, Muchun Song, Oscar Salvador,
	Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko,
	Jann Horn, Matthew Brost, Joshua Hahn, Rakie Kim, Byungchul Park,
	Gregory Price, Ying Huang, Alistair Popple, Pedro Falcato,
	Pasha Tatashin, Rik van Riel, Harry Yoo, kvm, linux-s390,
	linux-kernel, linux-fsdevel, linux-mm

There's an established convention in the kernel that we treat leaf page
tables (so far at the PTE, PMD level) as containing 'swap entries' should
they be neither empty (i.e. p**_none() evaluating true) nor present
(i.e. p**_present() evaluating true).

However, at the same time we also have helper predicates - is_swap_pte(),
is_swap_pmd() - which are inconsistently used.

This is problematic, as it is logical to assume that should somebody wish
to operate upon a page table swap entry they should first check to see if
it is in fact one.

It also implies that perhaps, in future, we might introduce a non-present,
none page table entry that is not a swap entry.

This series resolves this issue by systematically eliminating all use of
the is_swap_pte() and is swap_pmd() predicates so we retain only the
convention that should a leaf page table entry be neither none nor present
it is a swap entry.

We also have the further issue that 'swap entry' is unfortunately a really
rather overloaded term and in fact refers to both entries for swap and for
other information such as migration entries, page table markers, and device
private entries.

We therefore have the rather 'unique' concept of a 'non-swap' swap entry.

This is deeply confusing, so this series goes further and eliminates the
non_swap_entry() predicate, replacing it with is_non_present_entry() - with
an eye to a new convention of referring to these non-swap 'swap entries' as
non-present.

It also introduces the is_swap_entry() predicate to explicitly and
logically refer to actual 'true' swap entries, improving code readibility,
avoiding the hideous convention of:

	if (!non_swap_entry(entry)) {
		...
	}

As part of these changes we also introduce a few other new predicates:

* pte_to_swp_entry_or_zero() - allows for convenient conversion from a PTE
  to a swap entry if present, or an empty swap entry if none. This is
  useful as many swap entry conversions are simply checking for flags for
  which this suffices.

* get_pte_swap_entry() - Retrieves a PTE swap entry if it truly is a swap
  entry (i.e. not a non-present entry), returning true if so, otherwise
  returns false. This simplifies a lot of logic that previously open-coded
  this.

* is_huge_pmd() - Determines if a PMD contains either a present transparent
  huge page entry or a huge non-present entry. This again simplifies a lot
  of logic that simply open-coded this.

REVIEWERS NOTE:

This series applies against mm-unstable as there are currently conflicts
with mm-new. Should the series receive community assent I will resolve
these at the point the RFC tag is removed.

I also intend to use this as a foundation for further work to add higher
order page table markers.

Lorenzo Stoakes (12):
  mm: introduce and use pte_to_swp_entry_or_zero()
  mm: avoid unnecessary uses of is_swap_pte()
  mm: introduce get_pte_swap_entry() and use it
  mm: use get_pte_swap_entry() in debug pgtable + remove is_swap_pte()
  fs/proc/task_mmu: refactor pagemap_pmd_range()
  mm: avoid unnecessary use of is_swap_pmd()
  mm: introduce is_huge_pmd() and use where appropriate
  mm/huge_memory: refactor copy_huge_pmd() non-present logic
  mm/huge_memory: refactor change_huge_pmd() non-present logic
  mm: remove remaining is_swap_pmd() users and is_swap_pmd()
  mm: rename non_swap_entry() to is_non_present_entry()
  mm: provide is_swap_entry() and use it

 arch/s390/mm/gmap_helpers.c   |   2 +-
 arch/s390/mm/pgtable.c        |   2 +-
 fs/proc/task_mmu.c            | 214 ++++++++++++++++++++--------------
 include/linux/huge_mm.h       |  49 +++++---
 include/linux/swapops.h       |  99 ++++++++++++++--
 include/linux/userfaultfd_k.h |  16 +--
 mm/debug_vm_pgtable.c         |  43 ++++---
 mm/filemap.c                  |   2 +-
 mm/hmm.c                      |   2 +-
 mm/huge_memory.c              | 189 ++++++++++++++++--------------
 mm/hugetlb.c                  |   6 +-
 mm/internal.h                 |  12 +-
 mm/khugepaged.c               |  29 ++---
 mm/madvise.c                  |  14 +--
 mm/memory.c                   |  62 +++++-----
 mm/migrate.c                  |   2 +-
 mm/mincore.c                  |   2 +-
 mm/mprotect.c                 |  45 ++++---
 mm/mremap.c                   |   9 +-
 mm/page_table_check.c         |  25 ++--
 mm/page_vma_mapped.c          |  30 +++--
 mm/swap_state.c               |   5 +-
 mm/swapfile.c                 |   3 +-
 mm/userfaultfd.c              |   2 +-
 24 files changed, 511 insertions(+), 353 deletions(-)

--
2.51.0


^ permalink raw reply	[flat|nested] 47+ messages in thread

* [RFC PATCH 01/12] mm: introduce and use pte_to_swp_entry_or_zero()
  2025-10-24  7:41 [RFC PATCH 00/12] remove is_swap_[pte, pmd]() + non-swap confusion Lorenzo Stoakes
@ 2025-10-24  7:41 ` Lorenzo Stoakes
  2025-10-24  7:41 ` [RFC PATCH 02/12] mm: avoid unnecessary uses of is_swap_pte() Lorenzo Stoakes
                   ` (12 subsequent siblings)
  13 siblings, 0 replies; 47+ messages in thread
From: Lorenzo Stoakes @ 2025-10-24  7:41 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Christian Borntraeger, Janosch Frank, Claudio Imbrenda,
	David Hildenbrand, Alexander Gordeev, Gerald Schaefer,
	Heiko Carstens, Vasily Gorbik, Sven Schnelle, Zi Yan,
	Baolin Wang, Liam R . Howlett, Nico Pache, Ryan Roberts,
	Dev Jain, Barry Song, Lance Yang, Kemeng Shi, Kairui Song,
	Nhat Pham, Baoquan He, Chris Li, Peter Xu, Matthew Wilcox,
	Jason Gunthorpe, Leon Romanovsky, Muchun Song, Oscar Salvador,
	Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko,
	Jann Horn, Matthew Brost, Joshua Hahn, Rakie Kim, Byungchul Park,
	Gregory Price, Ying Huang, Alistair Popple, Pedro Falcato,
	Pasha Tatashin, Rik van Riel, Harry Yoo, kvm, linux-s390,
	linux-kernel, linux-fsdevel, linux-mm

When we check data in swap entries in the form of a predicate it is usually
the case that if the swap entry were zero, the predicate would evaluate as
false.

We can therefore simplify predicate checks where we first check to see if
the entry is indeed a swap entry before doing so by instead using a new
function which returns the zero swap entry (that is, swp_entry(0, 0))
should the entry be present.

The pte_none() case is also covered by this, as it will naturally evaluate
to swp_entry(0, 0).

We implement this via pte_to_swp_entry_or_zero(), which we then use in
is_pte_marker() and pte_marker_uffd_wp() both of which otherwise
unnecessarily utilise is_swap_pte().

We additionally update smaps_hugetlb_range() following the same pattern.

Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
---
 fs/proc/task_mmu.c            |  4 ++--
 include/linux/swapops.h       | 20 +++++++++++++++++++-
 include/linux/userfaultfd_k.h |  7 +------
 mm/madvise.c                  |  5 +++--
 mm/page_vma_mapped.c          |  5 +----
 5 files changed, 26 insertions(+), 15 deletions(-)

diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index fc35a0543f01..a7c8501266f4 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -1230,8 +1230,8 @@ static int smaps_hugetlb_range(pte_t *pte, unsigned long hmask,
 	if (pte_present(ptent)) {
 		folio = page_folio(pte_page(ptent));
 		present = true;
-	} else if (is_swap_pte(ptent)) {
-		swp_entry_t swpent = pte_to_swp_entry(ptent);
+	} else {
+		swp_entry_t swpent = pte_to_swp_entry_or_zero(ptent);
 
 		if (is_pfn_swap_entry(swpent))
 			folio = pfn_swap_entry_folio(swpent);
diff --git a/include/linux/swapops.h b/include/linux/swapops.h
index 2687928a8146..24eaf8825c6b 100644
--- a/include/linux/swapops.h
+++ b/include/linux/swapops.h
@@ -139,6 +139,24 @@ static inline swp_entry_t pte_to_swp_entry(pte_t pte)
 	return swp_entry(__swp_type(arch_entry), __swp_offset(arch_entry));
 }
 
+/**
+ * pte_to_swp_entry_or_zero() - Convert an arbitrary PTE entry to either its
+ * swap entry, or the zero swap entry if the PTE is either present or empty
+ * (none).
+ * @pte: The PTE entry we are evaluating.
+ *
+ * Returns: A valid swap entry or the zero swap entry if the PTE is present or
+ * none.
+ */
+static inline swp_entry_t pte_to_swp_entry_or_zero(pte_t pte)
+{
+	if (pte_present(pte))
+		return swp_entry(0, 0);
+
+	/* If none, this will return zero entry. */
+	return pte_to_swp_entry(pte);
+}
+
 /*
  * Convert the arch-independent representation of a swp_entry_t into the
  * arch-dependent pte representation.
@@ -438,7 +456,7 @@ static inline pte_marker pte_marker_get(swp_entry_t entry)
 
 static inline bool is_pte_marker(pte_t pte)
 {
-	return is_swap_pte(pte) && is_pte_marker_entry(pte_to_swp_entry(pte));
+	return is_pte_marker_entry(pte_to_swp_entry_or_zero(pte));
 }
 
 static inline pte_t make_pte_marker(pte_marker marker)
diff --git a/include/linux/userfaultfd_k.h b/include/linux/userfaultfd_k.h
index c0e716aec26a..4c65adff2e7a 100644
--- a/include/linux/userfaultfd_k.h
+++ b/include/linux/userfaultfd_k.h
@@ -447,12 +447,7 @@ static inline bool pte_marker_entry_uffd_wp(swp_entry_t entry)
 static inline bool pte_marker_uffd_wp(pte_t pte)
 {
 #ifdef CONFIG_PTE_MARKER_UFFD_WP
-	swp_entry_t entry;
-
-	if (!is_swap_pte(pte))
-		return false;
-
-	entry = pte_to_swp_entry(pte);
+	swp_entry_t entry = pte_to_swp_entry_or_zero(pte);
 
 	return pte_marker_entry_uffd_wp(entry);
 #else
diff --git a/mm/madvise.c b/mm/madvise.c
index fb1c86e630b6..f9f80b2e9d43 100644
--- a/mm/madvise.c
+++ b/mm/madvise.c
@@ -1071,8 +1071,9 @@ static bool is_valid_guard_vma(struct vm_area_struct *vma, bool allow_locked)
 
 static bool is_guard_pte_marker(pte_t ptent)
 {
-	return is_swap_pte(ptent) &&
-	       is_guard_swp_entry(pte_to_swp_entry(ptent));
+	const swp_entry_t entry = pte_to_swp_entry_or_zero(ptent);
+
+	return is_guard_swp_entry(entry);
 }
 
 static int guard_install_pud_entry(pud_t *pud, unsigned long addr,
diff --git a/mm/page_vma_mapped.c b/mm/page_vma_mapped.c
index 137ce27ff68c..75a8fbb788b7 100644
--- a/mm/page_vma_mapped.c
+++ b/mm/page_vma_mapped.c
@@ -107,10 +107,7 @@ static bool check_pte(struct page_vma_mapped_walk *pvmw, unsigned long pte_nr)
 	pte_t ptent = ptep_get(pvmw->pte);
 
 	if (pvmw->flags & PVMW_MIGRATION) {
-		swp_entry_t entry;
-		if (!is_swap_pte(ptent))
-			return false;
-		entry = pte_to_swp_entry(ptent);
+		swp_entry_t entry = pte_to_swp_entry_or_zero(ptent);
 
 		if (!is_migration_entry(entry))
 			return false;
-- 
2.51.0



^ permalink raw reply	[flat|nested] 47+ messages in thread

* [RFC PATCH 02/12] mm: avoid unnecessary uses of is_swap_pte()
  2025-10-24  7:41 [RFC PATCH 00/12] remove is_swap_[pte, pmd]() + non-swap confusion Lorenzo Stoakes
  2025-10-24  7:41 ` [RFC PATCH 01/12] mm: introduce and use pte_to_swp_entry_or_zero() Lorenzo Stoakes
@ 2025-10-24  7:41 ` Lorenzo Stoakes
  2025-10-24  7:41 ` [RFC PATCH 03/12] mm: introduce get_pte_swap_entry() and use it Lorenzo Stoakes
                   ` (11 subsequent siblings)
  13 siblings, 0 replies; 47+ messages in thread
From: Lorenzo Stoakes @ 2025-10-24  7:41 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Christian Borntraeger, Janosch Frank, Claudio Imbrenda,
	David Hildenbrand, Alexander Gordeev, Gerald Schaefer,
	Heiko Carstens, Vasily Gorbik, Sven Schnelle, Zi Yan,
	Baolin Wang, Liam R . Howlett, Nico Pache, Ryan Roberts,
	Dev Jain, Barry Song, Lance Yang, Kemeng Shi, Kairui Song,
	Nhat Pham, Baoquan He, Chris Li, Peter Xu, Matthew Wilcox,
	Jason Gunthorpe, Leon Romanovsky, Muchun Song, Oscar Salvador,
	Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko,
	Jann Horn, Matthew Brost, Joshua Hahn, Rakie Kim, Byungchul Park,
	Gregory Price, Ying Huang, Alistair Popple, Pedro Falcato,
	Pasha Tatashin, Rik van Riel, Harry Yoo, kvm, linux-s390,
	linux-kernel, linux-fsdevel, linux-mm

There's an established convention in the kernel that we treat PTEs as
containing swap entries (and the unfortunately named non-swap swap entries)
should they be neither empty (i.e. pte_none() evaluating true) nor present
(i.e. pte_present() evaluating true).

However, there is some inconsistency in how this is applied, as we also
have the is_swap_pte() helper which explicitly performs this check:

	/* check whether a pte points to a swap entry */
	static inline int is_swap_pte(pte_t pte)
	{
		return !pte_none(pte) && !pte_present(pte);
	}

As this represents a predicate, and it's logical to assume that in order to
establish that a PTE entry can correctly be manipulated as a swap/non-swap
entry, this predicate seems as if it must first be checked.

But we instead, we far more often utilise the established convention of
checking pte_none() / pte_present() before operating on entries as if they
were swap/non-swap.

This patch works towards correcting this inconsistency by removing all uses
of is_swap_pte() where we are already in a position where we perform
pte_none()/pte_present() checks anyway or otherwise it is clearly logical
to do so.

We also take advantage of the fact that pte_swp_uffd_wp() is only set on
swap entries.

Additionally, update comments referencing to is_swap_pte() and
non_swap_entry().

Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
---
 fs/proc/task_mmu.c            | 49 ++++++++++++++++++++++++-----------
 include/linux/userfaultfd_k.h |  9 ++++---
 mm/hugetlb.c                  |  6 ++---
 mm/internal.h                 |  6 ++---
 mm/khugepaged.c               | 29 +++++++++++----------
 mm/migrate.c                  |  2 +-
 mm/mprotect.c                 | 43 ++++++++++++++----------------
 mm/mremap.c                   |  7 +++--
 mm/page_table_check.c         | 13 ++++++----
 mm/page_vma_mapped.c          | 27 ++++++++++---------
 10 files changed, 106 insertions(+), 85 deletions(-)

diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index a7c8501266f4..5475acfa1a33 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -1017,7 +1017,9 @@ static void smaps_pte_entry(pte_t *pte, unsigned long addr,
 		young = pte_young(ptent);
 		dirty = pte_dirty(ptent);
 		present = true;
-	} else if (is_swap_pte(ptent)) {
+	} else if (pte_none(ptent)) {
+		smaps_pte_hole_lookup(addr, walk);
+	} else {
 		swp_entry_t swpent = pte_to_swp_entry(ptent);
 
 		if (!non_swap_entry(swpent)) {
@@ -1038,9 +1040,6 @@ static void smaps_pte_entry(pte_t *pte, unsigned long addr,
 				present = true;
 			page = pfn_swap_entry_to_page(swpent);
 		}
-	} else {
-		smaps_pte_hole_lookup(addr, walk);
-		return;
 	}
 
 	if (!page)
@@ -1611,6 +1610,9 @@ static inline void clear_soft_dirty(struct vm_area_struct *vma,
 	 */
 	pte_t ptent = ptep_get(pte);
 
+	if (pte_none(ptent))
+		return;
+
 	if (pte_present(ptent)) {
 		pte_t old_pte;
 
@@ -1620,7 +1622,7 @@ static inline void clear_soft_dirty(struct vm_area_struct *vma,
 		ptent = pte_wrprotect(old_pte);
 		ptent = pte_clear_soft_dirty(ptent);
 		ptep_modify_prot_commit(vma, addr, pte, old_pte, ptent);
-	} else if (is_swap_pte(ptent)) {
+	} else {
 		ptent = pte_swp_clear_soft_dirty(ptent);
 		set_pte_at(vma->vm_mm, addr, pte, ptent);
 	}
@@ -1923,6 +1925,9 @@ static pagemap_entry_t pte_to_pagemap_entry(struct pagemapread *pm,
 	struct page *page = NULL;
 	struct folio *folio;
 
+	if (pte_none(pte))
+		goto out;
+
 	if (pte_present(pte)) {
 		if (pm->show_pfn)
 			frame = pte_pfn(pte);
@@ -1932,8 +1937,9 @@ static pagemap_entry_t pte_to_pagemap_entry(struct pagemapread *pm,
 			flags |= PM_SOFT_DIRTY;
 		if (pte_uffd_wp(pte))
 			flags |= PM_UFFD_WP;
-	} else if (is_swap_pte(pte)) {
+	} else {
 		swp_entry_t entry;
+
 		if (pte_swp_soft_dirty(pte))
 			flags |= PM_SOFT_DIRTY;
 		if (pte_swp_uffd_wp(pte))
@@ -1941,6 +1947,7 @@ static pagemap_entry_t pte_to_pagemap_entry(struct pagemapread *pm,
 		entry = pte_to_swp_entry(pte);
 		if (pm->show_pfn) {
 			pgoff_t offset;
+
 			/*
 			 * For PFN swap offsets, keeping the offset field
 			 * to be PFN only to be compatible with old smaps.
@@ -1969,6 +1976,8 @@ static pagemap_entry_t pte_to_pagemap_entry(struct pagemapread *pm,
 		    __folio_page_mapped_exclusively(folio, page))
 			flags |= PM_MMAP_EXCLUSIVE;
 	}
+
+out:
 	if (vma->vm_flags & VM_SOFTDIRTY)
 		flags |= PM_SOFT_DIRTY;
 
@@ -2310,12 +2319,16 @@ static unsigned long pagemap_page_category(struct pagemap_scan_private *p,
 					   struct vm_area_struct *vma,
 					   unsigned long addr, pte_t pte)
 {
-	unsigned long categories = 0;
+	unsigned long categories;
+
+	if (pte_none(pte))
+		return 0;
 
 	if (pte_present(pte)) {
 		struct page *page;
 
-		categories |= PAGE_IS_PRESENT;
+		categories = PAGE_IS_PRESENT;
+
 		if (!pte_uffd_wp(pte))
 			categories |= PAGE_IS_WRITTEN;
 
@@ -2329,10 +2342,11 @@ static unsigned long pagemap_page_category(struct pagemap_scan_private *p,
 			categories |= PAGE_IS_PFNZERO;
 		if (pte_soft_dirty(pte))
 			categories |= PAGE_IS_SOFT_DIRTY;
-	} else if (is_swap_pte(pte)) {
+	} else {
 		swp_entry_t swp;
 
-		categories |= PAGE_IS_SWAPPED;
+		categories = PAGE_IS_SWAPPED;
+
 		if (!pte_swp_uffd_wp_any(pte))
 			categories |= PAGE_IS_WRITTEN;
 
@@ -2360,12 +2374,12 @@ static void make_uffd_wp_pte(struct vm_area_struct *vma,
 		old_pte = ptep_modify_prot_start(vma, addr, pte);
 		ptent = pte_mkuffd_wp(old_pte);
 		ptep_modify_prot_commit(vma, addr, pte, old_pte, ptent);
-	} else if (is_swap_pte(ptent)) {
-		ptent = pte_swp_mkuffd_wp(ptent);
-		set_pte_at(vma->vm_mm, addr, pte, ptent);
-	} else {
+	} else if (pte_none(ptent)) {
 		set_pte_at(vma->vm_mm, addr, pte,
 			   make_pte_marker(PTE_MARKER_UFFD_WP));
+	} else {
+		ptent = pte_swp_mkuffd_wp(ptent);
+		set_pte_at(vma->vm_mm, addr, pte, ptent);
 	}
 }
 
@@ -2434,6 +2448,9 @@ static unsigned long pagemap_hugetlb_category(pte_t pte)
 {
 	unsigned long categories = PAGE_IS_HUGE;
 
+	if (pte_none(pte))
+		return categories;
+
 	/*
 	 * According to pagemap_hugetlb_range(), file-backed HugeTLB
 	 * page cannot be swapped. So PAGE_IS_FILE is not checked for
@@ -2441,6 +2458,7 @@ static unsigned long pagemap_hugetlb_category(pte_t pte)
 	 */
 	if (pte_present(pte)) {
 		categories |= PAGE_IS_PRESENT;
+
 		if (!huge_pte_uffd_wp(pte))
 			categories |= PAGE_IS_WRITTEN;
 		if (!PageAnon(pte_page(pte)))
@@ -2449,8 +2467,9 @@ static unsigned long pagemap_hugetlb_category(pte_t pte)
 			categories |= PAGE_IS_PFNZERO;
 		if (pte_soft_dirty(pte))
 			categories |= PAGE_IS_SOFT_DIRTY;
-	} else if (is_swap_pte(pte)) {
+	} else {
 		categories |= PAGE_IS_SWAPPED;
+
 		if (!pte_swp_uffd_wp_any(pte))
 			categories |= PAGE_IS_WRITTEN;
 		if (pte_swp_soft_dirty(pte))
diff --git a/include/linux/userfaultfd_k.h b/include/linux/userfaultfd_k.h
index 4c65adff2e7a..a362e1619b95 100644
--- a/include/linux/userfaultfd_k.h
+++ b/include/linux/userfaultfd_k.h
@@ -462,13 +462,14 @@ static inline bool pte_marker_uffd_wp(pte_t pte)
 static inline bool pte_swp_uffd_wp_any(pte_t pte)
 {
 #ifdef CONFIG_PTE_MARKER_UFFD_WP
-	if (!is_swap_pte(pte))
-		return false;
+	swp_entry_t entry;
 
+	if (pte_present(pte))
+		return false;
 	if (pte_swp_uffd_wp(pte))
 		return true;
-
-	if (pte_marker_uffd_wp(pte))
+	entry = pte_to_swp_entry(pte);
+	if (pte_marker_entry_uffd_wp(entry))
 		return true;
 #endif
 	return false;
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 86e672fcb305..4510029761ed 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -5798,13 +5798,13 @@ static void move_huge_pte(struct vm_area_struct *vma, unsigned long old_addr,
 
 	pte = huge_ptep_get_and_clear(mm, old_addr, src_pte, sz);
 
-	if (need_clear_uffd_wp && pte_marker_uffd_wp(pte))
+	if (need_clear_uffd_wp && pte_marker_uffd_wp(pte)) {
 		huge_pte_clear(mm, new_addr, dst_pte, sz);
-	else {
+	} else {
 		if (need_clear_uffd_wp) {
 			if (pte_present(pte))
 				pte = huge_pte_clear_uffd_wp(pte);
-			else if (is_swap_pte(pte))
+			else
 				pte = pte_swp_clear_uffd_wp(pte);
 		}
 		set_huge_pte_at(mm, new_addr, dst_pte, pte, sz);
diff --git a/mm/internal.h b/mm/internal.h
index cbd3d897b16c..b855a4412878 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -325,8 +325,7 @@ unsigned int folio_pte_batch(struct folio *folio, pte_t *ptep, pte_t pte,
 /**
  * pte_move_swp_offset - Move the swap entry offset field of a swap pte
  *	 forward or backward by delta
- * @pte: The initial pte state; is_swap_pte(pte) must be true and
- *	 non_swap_entry() must be false.
+ * @pte: The initial pte state; must be a swap entry
  * @delta: The direction and the offset we are moving; forward if delta
  *	 is positive; backward if delta is negative
  *
@@ -352,8 +351,7 @@ static inline pte_t pte_move_swp_offset(pte_t pte, long delta)
 
 /**
  * pte_next_swp_offset - Increment the swap entry offset field of a swap pte.
- * @pte: The initial pte state; is_swap_pte(pte) must be true and
- *	 non_swap_entry() must be false.
+ * @pte: The initial pte state; must be a swap entry.
  *
  * Increments the swap offset, while maintaining all other fields, including
  * swap type, and any swp pte bits. The resulting pte is returned.
diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index 5b7276bc14b1..2079f270196f 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -1019,7 +1019,8 @@ static int __collapse_huge_page_swapin(struct mm_struct *mm,
 		}
 
 		vmf.orig_pte = ptep_get_lockless(pte);
-		if (!is_swap_pte(vmf.orig_pte))
+		if (pte_none(vmf.orig_pte) ||
+		    pte_present(vmf.orig_pte))
 			continue;
 
 		vmf.pte = pte;
@@ -1276,7 +1277,19 @@ static int hpage_collapse_scan_pmd(struct mm_struct *mm,
 	for (addr = start_addr, _pte = pte; _pte < pte + HPAGE_PMD_NR;
 	     _pte++, addr += PAGE_SIZE) {
 		pte_t pteval = ptep_get(_pte);
-		if (is_swap_pte(pteval)) {
+		if (pte_none_or_zero(pteval)) {
+			++none_or_zero;
+			if (!userfaultfd_armed(vma) &&
+			    (!cc->is_khugepaged ||
+			     none_or_zero <= khugepaged_max_ptes_none)) {
+				continue;
+			} else {
+				result = SCAN_EXCEED_NONE_PTE;
+				count_vm_event(THP_SCAN_EXCEED_NONE_PTE);
+				goto out_unmap;
+			}
+		}
+		if (!pte_present(pteval)) {
 			++unmapped;
 			if (!cc->is_khugepaged ||
 			    unmapped <= khugepaged_max_ptes_swap) {
@@ -1296,18 +1309,6 @@ static int hpage_collapse_scan_pmd(struct mm_struct *mm,
 				goto out_unmap;
 			}
 		}
-		if (pte_none_or_zero(pteval)) {
-			++none_or_zero;
-			if (!userfaultfd_armed(vma) &&
-			    (!cc->is_khugepaged ||
-			     none_or_zero <= khugepaged_max_ptes_none)) {
-				continue;
-			} else {
-				result = SCAN_EXCEED_NONE_PTE;
-				count_vm_event(THP_SCAN_EXCEED_NONE_PTE);
-				goto out_unmap;
-			}
-		}
 		if (pte_uffd_wp(pteval)) {
 			/*
 			 * Don't collapse the page if any of the small
diff --git a/mm/migrate.c b/mm/migrate.c
index 4324fc01bfce..69d8b4a9db25 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -492,7 +492,7 @@ void migration_entry_wait(struct mm_struct *mm, pmd_t *pmd,
 	pte = ptep_get(ptep);
 	pte_unmap(ptep);
 
-	if (!is_swap_pte(pte))
+	if (pte_none(pte) || pte_present(pte))
 		goto out;
 
 	entry = pte_to_swp_entry(pte);
diff --git a/mm/mprotect.c b/mm/mprotect.c
index c090bc063a31..e25ac9835cc2 100644
--- a/mm/mprotect.c
+++ b/mm/mprotect.c
@@ -345,7 +345,26 @@ static long change_pte_range(struct mmu_gather *tlb,
 				prot_commit_flush_ptes(vma, addr, pte, oldpte, ptent,
 					nr_ptes, /* idx = */ 0, /* set_write = */ false, tlb);
 			pages += nr_ptes;
-		} else if (is_swap_pte(oldpte)) {
+		} else if (pte_none(oldpte)) {
+			/*
+			 * Nobody plays with any none ptes besides
+			 * userfaultfd when applying the protections.
+			 */
+			if (likely(!uffd_wp))
+				continue;
+
+			if (userfaultfd_wp_use_markers(vma)) {
+				/*
+				 * For file-backed mem, we need to be able to
+				 * wr-protect a none pte, because even if the
+				 * pte is none, the page/swap cache could
+				 * exist.  Doing that by install a marker.
+				 */
+				set_pte_at(vma->vm_mm, addr, pte,
+					   make_pte_marker(PTE_MARKER_UFFD_WP));
+				pages++;
+			}
+		} else  {
 			swp_entry_t entry = pte_to_swp_entry(oldpte);
 			pte_t newpte;
 
@@ -406,28 +425,6 @@ static long change_pte_range(struct mmu_gather *tlb,
 				set_pte_at(vma->vm_mm, addr, pte, newpte);
 				pages++;
 			}
-		} else {
-			/* It must be an none page, or what else?.. */
-			WARN_ON_ONCE(!pte_none(oldpte));
-
-			/*
-			 * Nobody plays with any none ptes besides
-			 * userfaultfd when applying the protections.
-			 */
-			if (likely(!uffd_wp))
-				continue;
-
-			if (userfaultfd_wp_use_markers(vma)) {
-				/*
-				 * For file-backed mem, we need to be able to
-				 * wr-protect a none pte, because even if the
-				 * pte is none, the page/swap cache could
-				 * exist.  Doing that by install a marker.
-				 */
-				set_pte_at(vma->vm_mm, addr, pte,
-					   make_pte_marker(PTE_MARKER_UFFD_WP));
-				pages++;
-			}
 		}
 	} while (pte += nr_ptes, addr += nr_ptes * PAGE_SIZE, addr != end);
 	arch_leave_lazy_mmu_mode();
diff --git a/mm/mremap.c b/mm/mremap.c
index bd7314898ec5..f01c74add990 100644
--- a/mm/mremap.c
+++ b/mm/mremap.c
@@ -158,6 +158,9 @@ static void drop_rmap_locks(struct vm_area_struct *vma)
 
 static pte_t move_soft_dirty_pte(pte_t pte)
 {
+	if (pte_none(pte))
+		return pte;
+
 	/*
 	 * Set soft dirty bit so we can notice
 	 * in userspace the ptes were moved.
@@ -165,7 +168,7 @@ static pte_t move_soft_dirty_pte(pte_t pte)
 #ifdef CONFIG_MEM_SOFT_DIRTY
 	if (pte_present(pte))
 		pte = pte_mksoft_dirty(pte);
-	else if (is_swap_pte(pte))
+	else
 		pte = pte_swp_mksoft_dirty(pte);
 #endif
 	return pte;
@@ -294,7 +297,7 @@ static int move_ptes(struct pagetable_move_control *pmc,
 			if (need_clear_uffd_wp) {
 				if (pte_present(pte))
 					pte = pte_clear_uffd_wp(pte);
-				else if (is_swap_pte(pte))
+				else
 					pte = pte_swp_clear_uffd_wp(pte);
 			}
 			set_ptes(mm, new_addr, new_ptep, pte, nr_ptes);
diff --git a/mm/page_table_check.c b/mm/page_table_check.c
index 4eeca782b888..43f75d2f7c36 100644
--- a/mm/page_table_check.c
+++ b/mm/page_table_check.c
@@ -185,12 +185,15 @@ static inline bool swap_cached_writable(swp_entry_t entry)
 	       is_writable_migration_entry(entry);
 }
 
-static inline void page_table_check_pte_flags(pte_t pte)
+static void page_table_check_pte_flags(pte_t pte)
 {
-	if (pte_present(pte) && pte_uffd_wp(pte))
-		WARN_ON_ONCE(pte_write(pte));
-	else if (is_swap_pte(pte) && pte_swp_uffd_wp(pte))
-		WARN_ON_ONCE(swap_cached_writable(pte_to_swp_entry(pte)));
+	if (pte_present(pte)) {
+		WARN_ON_ONCE(pte_uffd_wp(pte) && pte_write(pte));
+	} else if (pte_swp_uffd_wp(pte)) {
+		const swp_entry_t entry = pte_to_swp_entry(pte);
+
+		WARN_ON_ONCE(swap_cached_writable(entry));
+	}
 }
 
 void __page_table_check_ptes_set(struct mm_struct *mm, pte_t *ptep, pte_t pte,
diff --git a/mm/page_vma_mapped.c b/mm/page_vma_mapped.c
index 75a8fbb788b7..2e5ac6572630 100644
--- a/mm/page_vma_mapped.c
+++ b/mm/page_vma_mapped.c
@@ -16,6 +16,7 @@ static inline bool not_found(struct page_vma_mapped_walk *pvmw)
 static bool map_pte(struct page_vma_mapped_walk *pvmw, pmd_t *pmdvalp,
 		    spinlock_t **ptlp)
 {
+	bool is_migration;
 	pte_t ptent;
 
 	if (pvmw->flags & PVMW_SYNC) {
@@ -26,6 +27,7 @@ static bool map_pte(struct page_vma_mapped_walk *pvmw, pmd_t *pmdvalp,
 		return !!pvmw->pte;
 	}
 
+	is_migration = pvmw->flags & PVMW_MIGRATION;
 again:
 	/*
 	 * It is important to return the ptl corresponding to pte,
@@ -41,11 +43,14 @@ static bool map_pte(struct page_vma_mapped_walk *pvmw, pmd_t *pmdvalp,
 
 	ptent = ptep_get(pvmw->pte);
 
-	if (pvmw->flags & PVMW_MIGRATION) {
-		if (!is_swap_pte(ptent))
+	if (pte_none(ptent)) {
+		return false;
+	} else if (pte_present(ptent)) {
+		if (is_migration)
 			return false;
-	} else if (is_swap_pte(ptent)) {
+	} else if (!is_migration) {
 		swp_entry_t entry;
+
 		/*
 		 * Handle un-addressable ZONE_DEVICE memory.
 		 *
@@ -66,8 +71,6 @@ static bool map_pte(struct page_vma_mapped_walk *pvmw, pmd_t *pmdvalp,
 		if (!is_device_private_entry(entry) &&
 		    !is_device_exclusive_entry(entry))
 			return false;
-	} else if (!pte_present(ptent)) {
-		return false;
 	}
 	spin_lock(*ptlp);
 	if (unlikely(!pmd_same(*pmdvalp, pmdp_get_lockless(pvmw->pmd)))) {
@@ -107,27 +110,23 @@ static bool check_pte(struct page_vma_mapped_walk *pvmw, unsigned long pte_nr)
 	pte_t ptent = ptep_get(pvmw->pte);
 
 	if (pvmw->flags & PVMW_MIGRATION) {
-		swp_entry_t entry = pte_to_swp_entry_or_zero(ptent);
+		const swp_entry_t entry = pte_to_swp_entry_or_zero(ptent);
 
 		if (!is_migration_entry(entry))
 			return false;
 
 		pfn = swp_offset_pfn(entry);
-	} else if (is_swap_pte(ptent)) {
-		swp_entry_t entry;
+	} else if (pte_present(ptent)) {
+		pfn = pte_pfn(ptent);
+	} else {
+		const swp_entry_t entry = pte_to_swp_entry(ptent);
 
 		/* Handle un-addressable ZONE_DEVICE memory */
-		entry = pte_to_swp_entry(ptent);
 		if (!is_device_private_entry(entry) &&
 		    !is_device_exclusive_entry(entry))
 			return false;
 
 		pfn = swp_offset_pfn(entry);
-	} else {
-		if (!pte_present(ptent))
-			return false;
-
-		pfn = pte_pfn(ptent);
 	}
 
 	if ((pfn + pte_nr - 1) < pvmw->pfn)
-- 
2.51.0



^ permalink raw reply	[flat|nested] 47+ messages in thread

* [RFC PATCH 03/12] mm: introduce get_pte_swap_entry() and use it
  2025-10-24  7:41 [RFC PATCH 00/12] remove is_swap_[pte, pmd]() + non-swap confusion Lorenzo Stoakes
  2025-10-24  7:41 ` [RFC PATCH 01/12] mm: introduce and use pte_to_swp_entry_or_zero() Lorenzo Stoakes
  2025-10-24  7:41 ` [RFC PATCH 02/12] mm: avoid unnecessary uses of is_swap_pte() Lorenzo Stoakes
@ 2025-10-24  7:41 ` Lorenzo Stoakes
  2025-10-24  7:41 ` [RFC PATCH 04/12] mm: use get_pte_swap_entry() in debug pgtable + remove is_swap_pte() Lorenzo Stoakes
                   ` (10 subsequent siblings)
  13 siblings, 0 replies; 47+ messages in thread
From: Lorenzo Stoakes @ 2025-10-24  7:41 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Christian Borntraeger, Janosch Frank, Claudio Imbrenda,
	David Hildenbrand, Alexander Gordeev, Gerald Schaefer,
	Heiko Carstens, Vasily Gorbik, Sven Schnelle, Zi Yan,
	Baolin Wang, Liam R . Howlett, Nico Pache, Ryan Roberts,
	Dev Jain, Barry Song, Lance Yang, Kemeng Shi, Kairui Song,
	Nhat Pham, Baoquan He, Chris Li, Peter Xu, Matthew Wilcox,
	Jason Gunthorpe, Leon Romanovsky, Muchun Song, Oscar Salvador,
	Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko,
	Jann Horn, Matthew Brost, Joshua Hahn, Rakie Kim, Byungchul Park,
	Gregory Price, Ying Huang, Alistair Popple, Pedro Falcato,
	Pasha Tatashin, Rik van Riel, Harry Yoo, kvm, linux-s390,
	linux-kernel, linux-fsdevel, linux-mm

We have a number of checks which explicitly check for 'true' swap entries,
that is swap entries which are not non-swap swap entries.

This is a confusing state of affairs, and we're duplicating checks as well
as using is_swap_pte() which is applied inconsistently throughout the code
base.

Avoid all this by introducing a new function, get_pte_swap_entry() that
explicitly checks for a true swap entry and returns it if the PTE contains
one.

We then update the code base to use this function.

Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
---
 include/linux/swapops.h | 29 +++++++++++++++++++++++++++++
 mm/internal.h           |  6 +++---
 mm/madvise.c            |  5 +----
 mm/swap_state.c         |  5 +----
 mm/swapfile.c           |  3 +--
 5 files changed, 35 insertions(+), 13 deletions(-)

diff --git a/include/linux/swapops.h b/include/linux/swapops.h
index 24eaf8825c6b..a557b0e7f05c 100644
--- a/include/linux/swapops.h
+++ b/include/linux/swapops.h
@@ -649,5 +649,34 @@ static inline int is_pmd_non_present_folio_entry(pmd_t pmd)
 	return is_pmd_migration_entry(pmd) || is_pmd_device_private_entry(pmd);
 }
 
+/**
+ * get_pte_swap_entry() - Gets PTE swap entry if one is present.
+ * @pte: The PTE we are checking.
+ * @entryp: Output pointer to a swap entry that will be populated upon
+ * success.
+ *
+ * Determines if the PTE describes an entry in swap or swap cache (i.e. is a
+ * swap entry and not a non-swap entry), if so it sets @entryp to the swap
+ * entry.
+ *
+ * This should only be used if we do not have any prior knowledge of this
+ * PTE's state.
+ *
+ * Return: true if swappable, false otherwise.
+ */
+static inline bool get_pte_swap_entry(pte_t pte, swp_entry_t *entryp)
+{
+	if (pte_present(pte))
+		return false;
+	if (pte_none(pte))
+		return false;
+
+	*entryp = pte_to_swp_entry(pte);
+	if (non_swap_entry(*entryp))
+		return false;
+
+	return true;
+}
+
 #endif /* CONFIG_MMU */
 #endif /* _LINUX_SWAPOPS_H */
diff --git a/mm/internal.h b/mm/internal.h
index b855a4412878..78dcf6048672 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -378,15 +378,15 @@ static inline pte_t pte_next_swp_offset(pte_t pte)
  */
 static inline int swap_pte_batch(pte_t *start_ptep, int max_nr, pte_t pte)
 {
+	swp_entry_t entry;
+	const bool __maybe_unused is_swap = get_pte_swap_entry(pte, &entry);
 	pte_t expected_pte = pte_next_swp_offset(pte);
 	const pte_t *end_ptep = start_ptep + max_nr;
-	swp_entry_t entry = pte_to_swp_entry(pte);
 	pte_t *ptep = start_ptep + 1;
 	unsigned short cgroup_id;
 
 	VM_WARN_ON(max_nr < 1);
-	VM_WARN_ON(!is_swap_pte(pte));
-	VM_WARN_ON(non_swap_entry(entry));
+	VM_WARN_ON(!is_swap);
 
 	cgroup_id = lookup_swap_cgroup_id(entry);
 	while (ptep < end_ptep) {
diff --git a/mm/madvise.c b/mm/madvise.c
index f9f80b2e9d43..578036ef6675 100644
--- a/mm/madvise.c
+++ b/mm/madvise.c
@@ -205,10 +205,7 @@ static int swapin_walk_pmd_entry(pmd_t *pmd, unsigned long start,
 		}
 
 		pte = ptep_get(ptep);
-		if (!is_swap_pte(pte))
-			continue;
-		entry = pte_to_swp_entry(pte);
-		if (unlikely(non_swap_entry(entry)))
+		if (!get_pte_swap_entry(pte, &entry))
 			continue;
 
 		pte_unmap_unlock(ptep, ptl);
diff --git a/mm/swap_state.c b/mm/swap_state.c
index b13e9c4baa90..9199b64206ff 100644
--- a/mm/swap_state.c
+++ b/mm/swap_state.c
@@ -754,10 +754,7 @@ static struct folio *swap_vma_readahead(swp_entry_t targ_entry, gfp_t gfp_mask,
 				break;
 		}
 		pentry = ptep_get_lockless(pte);
-		if (!is_swap_pte(pentry))
-			continue;
-		entry = pte_to_swp_entry(pentry);
-		if (unlikely(non_swap_entry(entry)))
+		if (!get_pte_swap_entry(pentry, &entry))
 			continue;
 		pte_unmap(pte);
 		pte = NULL;
diff --git a/mm/swapfile.c b/mm/swapfile.c
index cb2392ed8e0e..74eb9221a220 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -2253,10 +2253,9 @@ static int unuse_pte_range(struct vm_area_struct *vma, pmd_t *pmd,
 
 		ptent = ptep_get_lockless(pte);
 
-		if (!is_swap_pte(ptent))
+		if (!get_pte_swap_entry(ptent, &entry))
 			continue;
 
-		entry = pte_to_swp_entry(ptent);
 		if (swp_type(entry) != type)
 			continue;
 
-- 
2.51.0



^ permalink raw reply	[flat|nested] 47+ messages in thread

* [RFC PATCH 04/12] mm: use get_pte_swap_entry() in debug pgtable + remove is_swap_pte()
  2025-10-24  7:41 [RFC PATCH 00/12] remove is_swap_[pte, pmd]() + non-swap confusion Lorenzo Stoakes
                   ` (2 preceding siblings ...)
  2025-10-24  7:41 ` [RFC PATCH 03/12] mm: introduce get_pte_swap_entry() and use it Lorenzo Stoakes
@ 2025-10-24  7:41 ` Lorenzo Stoakes
  2025-10-24  7:41 ` [RFC PATCH 05/12] fs/proc/task_mmu: refactor pagemap_pmd_range() Lorenzo Stoakes
                   ` (9 subsequent siblings)
  13 siblings, 0 replies; 47+ messages in thread
From: Lorenzo Stoakes @ 2025-10-24  7:41 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Christian Borntraeger, Janosch Frank, Claudio Imbrenda,
	David Hildenbrand, Alexander Gordeev, Gerald Schaefer,
	Heiko Carstens, Vasily Gorbik, Sven Schnelle, Zi Yan,
	Baolin Wang, Liam R . Howlett, Nico Pache, Ryan Roberts,
	Dev Jain, Barry Song, Lance Yang, Kemeng Shi, Kairui Song,
	Nhat Pham, Baoquan He, Chris Li, Peter Xu, Matthew Wilcox,
	Jason Gunthorpe, Leon Romanovsky, Muchun Song, Oscar Salvador,
	Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko,
	Jann Horn, Matthew Brost, Joshua Hahn, Rakie Kim, Byungchul Park,
	Gregory Price, Ying Huang, Alistair Popple, Pedro Falcato,
	Pasha Tatashin, Rik van Riel, Harry Yoo, kvm, linux-s390,
	linux-kernel, linux-fsdevel, linux-mm

Replace invocations of is_swap_pte() with get_pte_swap_entry() in
mm/debug_vm_pgtable.c.

We update the test code to use a 'true' swap entry throughout so we are
guaranteed this is not a non-swap entry, so all asserts continue to operate
correctly.

With this change in place, we no longer use is_swap_pte() anywhere, so
remove it.

Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
---
 include/linux/swapops.h |  6 ------
 mm/debug_vm_pgtable.c   | 18 +++++++++---------
 2 files changed, 9 insertions(+), 15 deletions(-)

diff --git a/include/linux/swapops.h b/include/linux/swapops.h
index a557b0e7f05c..728e27e834be 100644
--- a/include/linux/swapops.h
+++ b/include/linux/swapops.h
@@ -120,12 +120,6 @@ static inline unsigned long swp_offset_pfn(swp_entry_t entry)
 	return swp_offset(entry) & SWP_PFN_MASK;
 }
 
-/* check whether a pte points to a swap entry */
-static inline int is_swap_pte(pte_t pte)
-{
-	return !pte_none(pte) && !pte_present(pte);
-}
-
 /*
  * Convert the arch-dependent pte representation of a swp_entry_t into an
  * arch-independent swp_entry_t.
diff --git a/mm/debug_vm_pgtable.c b/mm/debug_vm_pgtable.c
index 830107b6dd08..d4b2835569ce 100644
--- a/mm/debug_vm_pgtable.c
+++ b/mm/debug_vm_pgtable.c
@@ -701,13 +701,14 @@ static void __init pte_soft_dirty_tests(struct pgtable_debug_args *args)
 static void __init pte_swap_soft_dirty_tests(struct pgtable_debug_args *args)
 {
 	pte_t pte;
+	swp_entry_t entry;
 
 	if (!IS_ENABLED(CONFIG_MEM_SOFT_DIRTY))
 		return;
 
 	pr_debug("Validating PTE swap soft dirty\n");
 	pte = swp_entry_to_pte(args->swp_entry);
-	WARN_ON(!is_swap_pte(pte));
+	WARN_ON(!get_pte_swap_entry(pte, &entry));
 
 	WARN_ON(!pte_swp_soft_dirty(pte_swp_mksoft_dirty(pte)));
 	WARN_ON(pte_swp_soft_dirty(pte_swp_clear_soft_dirty(pte)));
@@ -763,20 +764,18 @@ static void __init pte_swap_exclusive_tests(struct pgtable_debug_args *args)
 
 	pte = swp_entry_to_pte(entry);
 	WARN_ON(pte_swp_exclusive(pte));
-	WARN_ON(!is_swap_pte(pte));
-	entry2 = pte_to_swp_entry(pte);
+	WARN_ON(!get_pte_swap_entry(pte, &entry2));
 	WARN_ON(memcmp(&entry, &entry2, sizeof(entry)));
 
 	pte = pte_swp_mkexclusive(pte);
 	WARN_ON(!pte_swp_exclusive(pte));
-	WARN_ON(!is_swap_pte(pte));
+	WARN_ON(!get_pte_swap_entry(pte, &entry2));
 	WARN_ON(pte_swp_soft_dirty(pte));
-	entry2 = pte_to_swp_entry(pte);
 	WARN_ON(memcmp(&entry, &entry2, sizeof(entry)));
 
 	pte = pte_swp_clear_exclusive(pte);
 	WARN_ON(pte_swp_exclusive(pte));
-	WARN_ON(!is_swap_pte(pte));
+	WARN_ON(!get_pte_swap_entry(pte, &entry2));
 	entry2 = pte_to_swp_entry(pte);
 	WARN_ON(memcmp(&entry, &entry2, sizeof(entry)));
 }
@@ -784,11 +783,12 @@ static void __init pte_swap_exclusive_tests(struct pgtable_debug_args *args)
 static void __init pte_swap_tests(struct pgtable_debug_args *args)
 {
 	swp_entry_t arch_entry;
+	swp_entry_t entry;
 	pte_t pte1, pte2;
 
 	pr_debug("Validating PTE swap\n");
 	pte1 = swp_entry_to_pte(args->swp_entry);
-	WARN_ON(!is_swap_pte(pte1));
+	WARN_ON(!get_pte_swap_entry(pte1, &entry));
 
 	arch_entry = __pte_to_swp_entry(pte1);
 	pte2 = __swp_entry_to_pte(arch_entry);
@@ -1205,8 +1205,8 @@ static int __init init_args(struct pgtable_debug_args *args)
 
 	/* See generic_max_swapfile_size(): probe the maximum offset */
 	max_swap_offset = swp_offset(pte_to_swp_entry(swp_entry_to_pte(swp_entry(0, ~0UL))));
-	/* Create a swp entry with all possible bits set */
-	args->swp_entry = swp_entry((1 << MAX_SWAPFILES_SHIFT) - 1, max_swap_offset);
+	/* Create a swp entry with all possible bits set while still being swap. */
+	args->swp_entry = swp_entry(MAX_SWAPFILES - 1, max_swap_offset);
 
 	/*
 	 * Allocate (huge) pages because some of the tests need to access
-- 
2.51.0



^ permalink raw reply	[flat|nested] 47+ messages in thread

* [RFC PATCH 05/12] fs/proc/task_mmu: refactor pagemap_pmd_range()
  2025-10-24  7:41 [RFC PATCH 00/12] remove is_swap_[pte, pmd]() + non-swap confusion Lorenzo Stoakes
                   ` (3 preceding siblings ...)
  2025-10-24  7:41 ` [RFC PATCH 04/12] mm: use get_pte_swap_entry() in debug pgtable + remove is_swap_pte() Lorenzo Stoakes
@ 2025-10-24  7:41 ` Lorenzo Stoakes
  2025-10-24 17:32   ` Gregory Price
  2025-10-24  7:41 ` [RFC PATCH 06/12] mm: avoid unnecessary use of is_swap_pmd() Lorenzo Stoakes
                   ` (8 subsequent siblings)
  13 siblings, 1 reply; 47+ messages in thread
From: Lorenzo Stoakes @ 2025-10-24  7:41 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Christian Borntraeger, Janosch Frank, Claudio Imbrenda,
	David Hildenbrand, Alexander Gordeev, Gerald Schaefer,
	Heiko Carstens, Vasily Gorbik, Sven Schnelle, Zi Yan,
	Baolin Wang, Liam R . Howlett, Nico Pache, Ryan Roberts,
	Dev Jain, Barry Song, Lance Yang, Kemeng Shi, Kairui Song,
	Nhat Pham, Baoquan He, Chris Li, Peter Xu, Matthew Wilcox,
	Jason Gunthorpe, Leon Romanovsky, Muchun Song, Oscar Salvador,
	Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko,
	Jann Horn, Matthew Brost, Joshua Hahn, Rakie Kim, Byungchul Park,
	Gregory Price, Ying Huang, Alistair Popple, Pedro Falcato,
	Pasha Tatashin, Rik van Riel, Harry Yoo, kvm, linux-s390,
	linux-kernel, linux-fsdevel, linux-mm

Separate out THP logic so we can drop an indentation level and reduce the
amount of noise in this function.

We add pagemap_pmd_range_thp() for this purpose.

While we're here, convert the VM_BUG_ON() to a VM_WARN_ON_ONCE() at the
same time.

Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
---
 fs/proc/task_mmu.c | 146 +++++++++++++++++++++++----------------------
 1 file changed, 76 insertions(+), 70 deletions(-)

diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index 5475acfa1a33..3c8be2579253 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -1984,91 +1984,97 @@ static pagemap_entry_t pte_to_pagemap_entry(struct pagemapread *pm,
 	return make_pme(frame, flags);
 }
 
-static int pagemap_pmd_range(pmd_t *pmdp, unsigned long addr, unsigned long end,
-			     struct mm_walk *walk)
+#ifdef CONFIG_TRANSPARENT_HUGEPAGE
+static int pagemap_pmd_range_thp(pmd_t *pmdp, unsigned long addr,
+		unsigned long end, struct vm_area_struct *vma,
+		struct pagemapread *pm, spinlock_t *ptl)
 {
-	struct vm_area_struct *vma = walk->vma;
-	struct pagemapread *pm = walk->private;
-	spinlock_t *ptl;
-	pte_t *pte, *orig_pte;
+	unsigned int idx = (addr & ~PMD_MASK) >> PAGE_SHIFT;
+	u64 flags = 0, frame = 0;
+	pmd_t pmd = *pmdp;
+	struct page *page = NULL;
+	struct folio *folio = NULL;
 	int err = 0;
-#ifdef CONFIG_TRANSPARENT_HUGEPAGE
 
-	ptl = pmd_trans_huge_lock(pmdp, vma);
-	if (ptl) {
-		unsigned int idx = (addr & ~PMD_MASK) >> PAGE_SHIFT;
-		u64 flags = 0, frame = 0;
-		pmd_t pmd = *pmdp;
-		struct page *page = NULL;
-		struct folio *folio = NULL;
+	if (vma->vm_flags & VM_SOFTDIRTY)
+		flags |= PM_SOFT_DIRTY;
 
-		if (vma->vm_flags & VM_SOFTDIRTY)
-			flags |= PM_SOFT_DIRTY;
+	if (pmd_present(pmd)) {
+		page = pmd_page(pmd);
 
-		if (pmd_present(pmd)) {
-			page = pmd_page(pmd);
+		flags |= PM_PRESENT;
+		if (pmd_soft_dirty(pmd))
+			flags |= PM_SOFT_DIRTY;
+		if (pmd_uffd_wp(pmd))
+			flags |= PM_UFFD_WP;
+		if (pm->show_pfn)
+			frame = pmd_pfn(pmd) + idx;
+	} else if (thp_migration_supported() && is_swap_pmd(pmd)) {
+		swp_entry_t entry = pmd_to_swp_entry(pmd);
+		unsigned long offset;
 
-			flags |= PM_PRESENT;
-			if (pmd_soft_dirty(pmd))
-				flags |= PM_SOFT_DIRTY;
-			if (pmd_uffd_wp(pmd))
-				flags |= PM_UFFD_WP;
-			if (pm->show_pfn)
-				frame = pmd_pfn(pmd) + idx;
-		}
-#ifdef CONFIG_ARCH_ENABLE_THP_MIGRATION
-		else if (is_swap_pmd(pmd)) {
-			swp_entry_t entry = pmd_to_swp_entry(pmd);
-			unsigned long offset;
-
-			if (pm->show_pfn) {
-				if (is_pfn_swap_entry(entry))
-					offset = swp_offset_pfn(entry) + idx;
-				else
-					offset = swp_offset(entry) + idx;
-				frame = swp_type(entry) |
-					(offset << MAX_SWAPFILES_SHIFT);
-			}
-			flags |= PM_SWAP;
-			if (pmd_swp_soft_dirty(pmd))
-				flags |= PM_SOFT_DIRTY;
-			if (pmd_swp_uffd_wp(pmd))
-				flags |= PM_UFFD_WP;
-			VM_BUG_ON(!is_pmd_migration_entry(pmd));
-			page = pfn_swap_entry_to_page(entry);
+		if (pm->show_pfn) {
+			if (is_pfn_swap_entry(entry))
+				offset = swp_offset_pfn(entry) + idx;
+			else
+				offset = swp_offset(entry) + idx;
+			frame = swp_type(entry) |
+				(offset << MAX_SWAPFILES_SHIFT);
 		}
-#endif
+		flags |= PM_SWAP;
+		if (pmd_swp_soft_dirty(pmd))
+			flags |= PM_SOFT_DIRTY;
+		if (pmd_swp_uffd_wp(pmd))
+			flags |= PM_UFFD_WP;
+		VM_WARN_ON_ONCE(!is_pmd_migration_entry(pmd));
+		page = pfn_swap_entry_to_page(entry);
+	}
 
-		if (page) {
-			folio = page_folio(page);
-			if (!folio_test_anon(folio))
-				flags |= PM_FILE;
-		}
+	if (page) {
+		folio = page_folio(page);
+		if (!folio_test_anon(folio))
+			flags |= PM_FILE;
+	}
 
-		for (; addr != end; addr += PAGE_SIZE, idx++) {
-			u64 cur_flags = flags;
-			pagemap_entry_t pme;
+	for (; addr != end; addr += PAGE_SIZE, idx++) {
+		u64 cur_flags = flags;
+		pagemap_entry_t pme;
 
-			if (folio && (flags & PM_PRESENT) &&
-			    __folio_page_mapped_exclusively(folio, page))
-				cur_flags |= PM_MMAP_EXCLUSIVE;
+		if (folio && (flags & PM_PRESENT) &&
+		    __folio_page_mapped_exclusively(folio, page))
+			cur_flags |= PM_MMAP_EXCLUSIVE;
 
-			pme = make_pme(frame, cur_flags);
-			err = add_to_pagemap(&pme, pm);
-			if (err)
-				break;
-			if (pm->show_pfn) {
-				if (flags & PM_PRESENT)
-					frame++;
-				else if (flags & PM_SWAP)
-					frame += (1 << MAX_SWAPFILES_SHIFT);
-			}
+		pme = make_pme(frame, cur_flags);
+		err = add_to_pagemap(&pme, pm);
+		if (err)
+			break;
+		if (pm->show_pfn) {
+			if (flags & PM_PRESENT)
+				frame++;
+			else if (flags & PM_SWAP)
+				frame += (1 << MAX_SWAPFILES_SHIFT);
 		}
-		spin_unlock(ptl);
-		return err;
 	}
+	spin_unlock(ptl);
+	return err;
+}
 #endif /* CONFIG_TRANSPARENT_HUGEPAGE */
 
+static int pagemap_pmd_range(pmd_t *pmdp, unsigned long addr, unsigned long end,
+			     struct mm_walk *walk)
+{
+	struct vm_area_struct *vma = walk->vma;
+	struct pagemapread *pm = walk->private;
+	spinlock_t *ptl;
+	pte_t *pte, *orig_pte;
+	int err = 0;
+
+#ifdef CONFIG_TRANSPARENT_HUGEPAGE
+	ptl = pmd_trans_huge_lock(pmdp, vma);
+	if (ptl)
+		return pagemap_pmd_range_thp(pmdp, addr, end, vma, pm, ptl);
+#endif
+
 	/*
 	 * We can assume that @vma always points to a valid one and @end never
 	 * goes beyond vma->vm_end.
-- 
2.51.0



^ permalink raw reply	[flat|nested] 47+ messages in thread

* [RFC PATCH 06/12] mm: avoid unnecessary use of is_swap_pmd()
  2025-10-24  7:41 [RFC PATCH 00/12] remove is_swap_[pte, pmd]() + non-swap confusion Lorenzo Stoakes
                   ` (4 preceding siblings ...)
  2025-10-24  7:41 ` [RFC PATCH 05/12] fs/proc/task_mmu: refactor pagemap_pmd_range() Lorenzo Stoakes
@ 2025-10-24  7:41 ` Lorenzo Stoakes
  2025-10-24  7:41 ` [RFC PATCH 07/12] mm: introduce is_huge_pmd() and use where appropriate Lorenzo Stoakes
                   ` (7 subsequent siblings)
  13 siblings, 0 replies; 47+ messages in thread
From: Lorenzo Stoakes @ 2025-10-24  7:41 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Christian Borntraeger, Janosch Frank, Claudio Imbrenda,
	David Hildenbrand, Alexander Gordeev, Gerald Schaefer,
	Heiko Carstens, Vasily Gorbik, Sven Schnelle, Zi Yan,
	Baolin Wang, Liam R . Howlett, Nico Pache, Ryan Roberts,
	Dev Jain, Barry Song, Lance Yang, Kemeng Shi, Kairui Song,
	Nhat Pham, Baoquan He, Chris Li, Peter Xu, Matthew Wilcox,
	Jason Gunthorpe, Leon Romanovsky, Muchun Song, Oscar Salvador,
	Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko,
	Jann Horn, Matthew Brost, Joshua Hahn, Rakie Kim, Byungchul Park,
	Gregory Price, Ying Huang, Alistair Popple, Pedro Falcato,
	Pasha Tatashin, Rik van Riel, Harry Yoo, kvm, linux-s390,
	linux-kernel, linux-fsdevel, linux-mm

PMD 'non-swap' swap entries are currently used for PMD-level migration
entries and device private entries.

To add to the confusion in this terminology we use is_swap_pmd() in an
inconsistent way similar to how is_swap_pte() was being used - sometimes
adopting the convention that pmd_none(), !pmd_present() implies PMD 'swap'
entry, sometimes not.

This patch handles the low-hanging fruit of cases where we can simply
substitute other predicates for is_swap_pmd().

Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
---
 fs/proc/task_mmu.c      | 15 ++++++++++---
 include/linux/swapops.h | 16 +++++++++++--
 mm/huge_memory.c        |  4 +++-
 mm/memory.c             | 50 +++++++++++++++++++++++------------------
 mm/page_table_check.c   | 12 ++++++----
 5 files changed, 65 insertions(+), 32 deletions(-)

diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index 3c8be2579253..1c32a0e2b965 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -1059,10 +1059,12 @@ static void smaps_pmd_entry(pmd_t *pmd, unsigned long addr,
 	bool present = false;
 	struct folio *folio;
 
+	if (pmd_none(*pmd))
+		return;
 	if (pmd_present(*pmd)) {
 		page = vm_normal_page_pmd(vma, addr, *pmd);
 		present = true;
-	} else if (unlikely(thp_migration_supported() && is_swap_pmd(*pmd))) {
+	} else if (unlikely(thp_migration_supported())) {
 		swp_entry_t entry = pmd_to_swp_entry(*pmd);
 
 		if (is_pfn_swap_entry(entry))
@@ -1999,6 +2001,9 @@ static int pagemap_pmd_range_thp(pmd_t *pmdp, unsigned long addr,
 	if (vma->vm_flags & VM_SOFTDIRTY)
 		flags |= PM_SOFT_DIRTY;
 
+	if (pmd_none(pmd))
+		goto populate_pagemap;
+
 	if (pmd_present(pmd)) {
 		page = pmd_page(pmd);
 
@@ -2009,7 +2014,7 @@ static int pagemap_pmd_range_thp(pmd_t *pmdp, unsigned long addr,
 			flags |= PM_UFFD_WP;
 		if (pm->show_pfn)
 			frame = pmd_pfn(pmd) + idx;
-	} else if (thp_migration_supported() && is_swap_pmd(pmd)) {
+	} else if (thp_migration_supported()) {
 		swp_entry_t entry = pmd_to_swp_entry(pmd);
 		unsigned long offset;
 
@@ -2036,6 +2041,7 @@ static int pagemap_pmd_range_thp(pmd_t *pmdp, unsigned long addr,
 			flags |= PM_FILE;
 	}
 
+populate_pagemap:
 	for (; addr != end; addr += PAGE_SIZE, idx++) {
 		u64 cur_flags = flags;
 		pagemap_entry_t pme;
@@ -2396,6 +2402,9 @@ static unsigned long pagemap_thp_category(struct pagemap_scan_private *p,
 {
 	unsigned long categories = PAGE_IS_HUGE;
 
+	if (pmd_none(pmd))
+		return categories;
+
 	if (pmd_present(pmd)) {
 		struct page *page;
 
@@ -2413,7 +2422,7 @@ static unsigned long pagemap_thp_category(struct pagemap_scan_private *p,
 			categories |= PAGE_IS_PFNZERO;
 		if (pmd_soft_dirty(pmd))
 			categories |= PAGE_IS_SOFT_DIRTY;
-	} else if (is_swap_pmd(pmd)) {
+	} else {
 		swp_entry_t swp;
 
 		categories |= PAGE_IS_SWAPPED;
diff --git a/include/linux/swapops.h b/include/linux/swapops.h
index 728e27e834be..8642e590504a 100644
--- a/include/linux/swapops.h
+++ b/include/linux/swapops.h
@@ -573,7 +573,13 @@ static inline pmd_t swp_entry_to_pmd(swp_entry_t entry)
 
 static inline int is_pmd_migration_entry(pmd_t pmd)
 {
-	return is_swap_pmd(pmd) && is_migration_entry(pmd_to_swp_entry(pmd));
+	swp_entry_t entry;
+
+	if (pmd_present(pmd))
+		return 0;
+
+	entry = pmd_to_swp_entry(pmd);
+	return is_migration_entry(entry);
 }
 #else  /* CONFIG_ARCH_ENABLE_THP_MIGRATION */
 static inline int set_pmd_migration_entry(struct page_vma_mapped_walk *pvmw,
@@ -621,7 +627,13 @@ static inline int is_pmd_migration_entry(pmd_t pmd)
  */
 static inline int is_pmd_device_private_entry(pmd_t pmd)
 {
-	return is_swap_pmd(pmd) && is_device_private_entry(pmd_to_swp_entry(pmd));
+	swp_entry_t entry;
+
+	if (pmd_present(pmd))
+		return 0;
+
+	entry = pmd_to_swp_entry(pmd);
+	return is_device_private_entry(entry);
 }
 
 #else /* CONFIG_ZONE_DEVICE && CONFIG_ARCH_ENABLE_THP_MIGRATION */
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 370ecfd6a182..a59718f85ec3 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -2428,9 +2428,11 @@ static pmd_t move_soft_dirty_pmd(pmd_t pmd)
 
 static pmd_t clear_uffd_wp_pmd(pmd_t pmd)
 {
+	if (pmd_none(pmd))
+		return pmd;
 	if (pmd_present(pmd))
 		pmd = pmd_clear_uffd_wp(pmd);
-	else if (is_swap_pmd(pmd))
+	else
 		pmd = pmd_swp_clear_uffd_wp(pmd);
 
 	return pmd;
diff --git a/mm/memory.c b/mm/memory.c
index 19615bcf234f..83828548ef5f 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -1375,6 +1375,7 @@ copy_pmd_range(struct vm_area_struct *dst_vma, struct vm_area_struct *src_vma,
 		next = pmd_addr_end(addr, end);
 		if (is_swap_pmd(*src_pmd) || pmd_trans_huge(*src_pmd)) {
 			int err;
+
 			VM_BUG_ON_VMA(next-addr != HPAGE_PMD_SIZE, src_vma);
 			err = copy_huge_pmd(dst_mm, src_mm, dst_pmd, src_pmd,
 					    addr, dst_vma, src_vma);
@@ -6331,35 +6332,40 @@ static vm_fault_t __handle_mm_fault(struct vm_area_struct *vma,
 	if (pmd_none(*vmf.pmd) &&
 	    thp_vma_allowable_order(vma, vm_flags, TVA_PAGEFAULT, PMD_ORDER)) {
 		ret = create_huge_pmd(&vmf);
-		if (!(ret & VM_FAULT_FALLBACK))
+		if (ret & VM_FAULT_FALLBACK)
+			goto fallback;
+		else
 			return ret;
-	} else {
-		vmf.orig_pmd = pmdp_get_lockless(vmf.pmd);
+	}
 
-		if (unlikely(is_swap_pmd(vmf.orig_pmd))) {
-			if (is_pmd_device_private_entry(vmf.orig_pmd))
-				return do_huge_pmd_device_private(&vmf);
+	vmf.orig_pmd = pmdp_get_lockless(vmf.pmd);
+	if (pmd_none(vmf.orig_pmd))
+		goto fallback;
 
-			if (is_pmd_migration_entry(vmf.orig_pmd))
-				pmd_migration_entry_wait(mm, vmf.pmd);
-			return 0;
-		}
-		if (pmd_trans_huge(vmf.orig_pmd)) {
-			if (pmd_protnone(vmf.orig_pmd) && vma_is_accessible(vma))
-				return do_huge_pmd_numa_page(&vmf);
+	if (unlikely(!pmd_present(vmf.orig_pmd))) {
+		if (is_pmd_device_private_entry(vmf.orig_pmd))
+			return do_huge_pmd_device_private(&vmf);
 
-			if ((flags & (FAULT_FLAG_WRITE|FAULT_FLAG_UNSHARE)) &&
-			    !pmd_write(vmf.orig_pmd)) {
-				ret = wp_huge_pmd(&vmf);
-				if (!(ret & VM_FAULT_FALLBACK))
-					return ret;
-			} else {
-				huge_pmd_set_accessed(&vmf);
-				return 0;
-			}
+		if (is_pmd_migration_entry(vmf.orig_pmd))
+			pmd_migration_entry_wait(mm, vmf.pmd);
+		return 0;
+	}
+	if (pmd_trans_huge(vmf.orig_pmd)) {
+		if (pmd_protnone(vmf.orig_pmd) && vma_is_accessible(vma))
+			return do_huge_pmd_numa_page(&vmf);
+
+		if ((flags & (FAULT_FLAG_WRITE|FAULT_FLAG_UNSHARE)) &&
+		    !pmd_write(vmf.orig_pmd)) {
+			ret = wp_huge_pmd(&vmf);
+			if (!(ret & VM_FAULT_FALLBACK))
+				return ret;
+		} else {
+			huge_pmd_set_accessed(&vmf);
+			return 0;
 		}
 	}
 
+fallback:
 	return handle_pte_fault(&vmf);
 }
 
diff --git a/mm/page_table_check.c b/mm/page_table_check.c
index 43f75d2f7c36..f5f25e120f69 100644
--- a/mm/page_table_check.c
+++ b/mm/page_table_check.c
@@ -215,10 +215,14 @@ EXPORT_SYMBOL(__page_table_check_ptes_set);
 
 static inline void page_table_check_pmd_flags(pmd_t pmd)
 {
-	if (pmd_present(pmd) && pmd_uffd_wp(pmd))
-		WARN_ON_ONCE(pmd_write(pmd));
-	else if (is_swap_pmd(pmd) && pmd_swp_uffd_wp(pmd))
-		WARN_ON_ONCE(swap_cached_writable(pmd_to_swp_entry(pmd)));
+	if (pmd_present(pmd)) {
+		if (pmd_uffd_wp(pmd))
+			WARN_ON_ONCE(pmd_write(pmd));
+	} else if (pmd_swp_uffd_wp(pmd)) {
+		swp_entry_t entry = pmd_to_swp_entry(pmd);
+
+		WARN_ON_ONCE(swap_cached_writable(entry));
+	}
 }
 
 void __page_table_check_pmds_set(struct mm_struct *mm, pmd_t *pmdp, pmd_t pmd,
-- 
2.51.0



^ permalink raw reply	[flat|nested] 47+ messages in thread

* [RFC PATCH 07/12] mm: introduce is_huge_pmd() and use where appropriate
  2025-10-24  7:41 [RFC PATCH 00/12] remove is_swap_[pte, pmd]() + non-swap confusion Lorenzo Stoakes
                   ` (5 preceding siblings ...)
  2025-10-24  7:41 ` [RFC PATCH 06/12] mm: avoid unnecessary use of is_swap_pmd() Lorenzo Stoakes
@ 2025-10-24  7:41 ` Lorenzo Stoakes
  2025-10-24  7:41 ` [RFC PATCH 08/12] mm/huge_memory: refactor copy_huge_pmd() non-present logic Lorenzo Stoakes
                   ` (6 subsequent siblings)
  13 siblings, 0 replies; 47+ messages in thread
From: Lorenzo Stoakes @ 2025-10-24  7:41 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Christian Borntraeger, Janosch Frank, Claudio Imbrenda,
	David Hildenbrand, Alexander Gordeev, Gerald Schaefer,
	Heiko Carstens, Vasily Gorbik, Sven Schnelle, Zi Yan,
	Baolin Wang, Liam R . Howlett, Nico Pache, Ryan Roberts,
	Dev Jain, Barry Song, Lance Yang, Kemeng Shi, Kairui Song,
	Nhat Pham, Baoquan He, Chris Li, Peter Xu, Matthew Wilcox,
	Jason Gunthorpe, Leon Romanovsky, Muchun Song, Oscar Salvador,
	Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko,
	Jann Horn, Matthew Brost, Joshua Hahn, Rakie Kim, Byungchul Park,
	Gregory Price, Ying Huang, Alistair Popple, Pedro Falcato,
	Pasha Tatashin, Rik van Riel, Harry Yoo, kvm, linux-s390,
	linux-kernel, linux-fsdevel, linux-mm

The 'swap' PMD case is confusing as only non-present entries (i.e. non-swap
'swap' entries) are valid at PMD level- currently migration entries and
device private entries.

We repeatedly perform checks of the form is_swap_pmd() || pmd_trans_huge()
which is itself confusing - it implies that swap entries at PMD level exist
and are different from huge entries.

Address this confusion by introduced is_huge_pmd() which checks for either
case. Sadly due to header dependency issues (huge_mm.h is included very
early on in headers and cannot really rely on much else) we cannot assert
here that is_pmd_non_present_folio_entry() is true for the input PMD if
non-present.

However since these are the only valid, handled cases the function is still
achieving what it intends to do.

We then replace all instances of is_swap_pmd() || pmd_trans_huge() with
is_huge_pmd() invocations and adjust logic accordingly to accommodate
this.

Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
---
 include/linux/huge_mm.h | 40 ++++++++++++++++++++++++++++++++++++----
 mm/huge_memory.c        |  3 ++-
 mm/memory.c             |  4 ++--
 mm/mprotect.c           |  2 +-
 mm/mremap.c             |  2 +-
 5 files changed, 42 insertions(+), 9 deletions(-)

diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
index 7698b3542c4f..892cb825dfc7 100644
--- a/include/linux/huge_mm.h
+++ b/include/linux/huge_mm.h
@@ -416,10 +416,37 @@ void reparent_deferred_split_queue(struct mem_cgroup *memcg);
 void __split_huge_pmd(struct vm_area_struct *vma, pmd_t *pmd,
 		unsigned long address, bool freeze);
 
+/**
+ * is_huge_pmd() - Is this PMD either a huge PMD entry or a huge non-present
+ * entry?
+ * @pmd: The PMD to check.
+ *
+ * A huge PMD entry is a non-empty entry which is present and marked huge or a
+ * huge non-present entry. This check be performed without the appropriate locks
+ * held, in which case the condition should be rechecked after they are
+ * acquired.
+ *
+ * Returns: true if this PMD is huge, false otherwise.
+ */
+static inline bool is_huge_pmd(pmd_t pmd)
+{
+	if (pmd_present(pmd)) {
+		return pmd_trans_huge(pmd);
+	} else if (!pmd_none(pmd)) {
+		/*
+		 * Non-present PMDs must be valid huge non-present entries. We
+		 * cannot assert that here due to header dependency issues.
+		 */
+		return true;
+	}
+
+	return false;
+}
+
 #define split_huge_pmd(__vma, __pmd, __address)				\
 	do {								\
 		pmd_t *____pmd = (__pmd);				\
-		if (is_swap_pmd(*____pmd) || pmd_trans_huge(*____pmd))	\
+		if (is_huge_pmd(*____pmd))				\
 			__split_huge_pmd(__vma, __pmd, __address,	\
 					 false);			\
 	}  while (0)
@@ -466,10 +493,10 @@ static inline int is_swap_pmd(pmd_t pmd)
 static inline spinlock_t *pmd_trans_huge_lock(pmd_t *pmd,
 		struct vm_area_struct *vma)
 {
-	if (is_swap_pmd(*pmd) || pmd_trans_huge(*pmd))
+	if (is_huge_pmd(*pmd))
 		return __pmd_trans_huge_lock(pmd, vma);
-	else
-		return NULL;
+
+	return NULL;
 }
 static inline spinlock_t *pud_trans_huge_lock(pud_t *pud,
 		struct vm_area_struct *vma)
@@ -734,6 +761,11 @@ static inline struct folio *get_persistent_huge_zero_folio(void)
 {
 	return NULL;
 }
+
+static inline bool is_huge_pmd(pmd_t pmd)
+{
+	return false;
+}
 #endif /* CONFIG_TRANSPARENT_HUGEPAGE */
 
 static inline int split_folio_to_list_to_order(struct folio *folio,
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index a59718f85ec3..e0fe1c3e01c9 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -2799,8 +2799,9 @@ int move_pages_huge_pmd(struct mm_struct *mm, pmd_t *dst_pmd, pmd_t *src_pmd, pm
 spinlock_t *__pmd_trans_huge_lock(pmd_t *pmd, struct vm_area_struct *vma)
 {
 	spinlock_t *ptl;
+
 	ptl = pmd_lock(vma->vm_mm, pmd);
-	if (likely(is_swap_pmd(*pmd) || pmd_trans_huge(*pmd)))
+	if (likely(is_huge_pmd(*pmd)))
 		return ptl;
 	spin_unlock(ptl);
 	return NULL;
diff --git a/mm/memory.c b/mm/memory.c
index 83828548ef5f..cc163060933f 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -1373,7 +1373,7 @@ copy_pmd_range(struct vm_area_struct *dst_vma, struct vm_area_struct *src_vma,
 	src_pmd = pmd_offset(src_pud, addr);
 	do {
 		next = pmd_addr_end(addr, end);
-		if (is_swap_pmd(*src_pmd) || pmd_trans_huge(*src_pmd)) {
+		if (is_huge_pmd(*src_pmd)) {
 			int err;
 
 			VM_BUG_ON_VMA(next-addr != HPAGE_PMD_SIZE, src_vma);
@@ -1921,7 +1921,7 @@ static inline unsigned long zap_pmd_range(struct mmu_gather *tlb,
 	pmd = pmd_offset(pud, addr);
 	do {
 		next = pmd_addr_end(addr, end);
-		if (is_swap_pmd(*pmd) || pmd_trans_huge(*pmd)) {
+		if (is_huge_pmd(*pmd)) {
 			if (next - addr != HPAGE_PMD_SIZE)
 				__split_huge_pmd(vma, pmd, addr, false);
 			else if (zap_huge_pmd(tlb, vma, pmd, addr)) {
diff --git a/mm/mprotect.c b/mm/mprotect.c
index e25ac9835cc2..c1d640758b60 100644
--- a/mm/mprotect.c
+++ b/mm/mprotect.c
@@ -522,7 +522,7 @@ static inline long change_pmd_range(struct mmu_gather *tlb,
 			goto next;
 
 		_pmd = pmdp_get_lockless(pmd);
-		if (is_swap_pmd(_pmd) || pmd_trans_huge(_pmd)) {
+		if (is_huge_pmd(_pmd)) {
 			if ((next - addr != HPAGE_PMD_SIZE) ||
 			    pgtable_split_needed(vma, cp_flags)) {
 				__split_huge_pmd(vma, pmd, addr, false);
diff --git a/mm/mremap.c b/mm/mremap.c
index f01c74add990..070ba5a85824 100644
--- a/mm/mremap.c
+++ b/mm/mremap.c
@@ -850,7 +850,7 @@ unsigned long move_page_tables(struct pagetable_move_control *pmc)
 		if (!new_pmd)
 			break;
 again:
-		if (is_swap_pmd(*old_pmd) || pmd_trans_huge(*old_pmd)) {
+		if (is_huge_pmd(*old_pmd)) {
 			if (extent == HPAGE_PMD_SIZE &&
 			    move_pgt_entry(pmc, HPAGE_PMD, old_pmd, new_pmd))
 				continue;
-- 
2.51.0



^ permalink raw reply	[flat|nested] 47+ messages in thread

* [RFC PATCH 08/12] mm/huge_memory: refactor copy_huge_pmd() non-present logic
  2025-10-24  7:41 [RFC PATCH 00/12] remove is_swap_[pte, pmd]() + non-swap confusion Lorenzo Stoakes
                   ` (6 preceding siblings ...)
  2025-10-24  7:41 ` [RFC PATCH 07/12] mm: introduce is_huge_pmd() and use where appropriate Lorenzo Stoakes
@ 2025-10-24  7:41 ` Lorenzo Stoakes
  2025-10-24  7:41 ` [RFC PATCH 09/12] mm/huge_memory: refactor change_huge_pmd() " Lorenzo Stoakes
                   ` (5 subsequent siblings)
  13 siblings, 0 replies; 47+ messages in thread
From: Lorenzo Stoakes @ 2025-10-24  7:41 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Christian Borntraeger, Janosch Frank, Claudio Imbrenda,
	David Hildenbrand, Alexander Gordeev, Gerald Schaefer,
	Heiko Carstens, Vasily Gorbik, Sven Schnelle, Zi Yan,
	Baolin Wang, Liam R . Howlett, Nico Pache, Ryan Roberts,
	Dev Jain, Barry Song, Lance Yang, Kemeng Shi, Kairui Song,
	Nhat Pham, Baoquan He, Chris Li, Peter Xu, Matthew Wilcox,
	Jason Gunthorpe, Leon Romanovsky, Muchun Song, Oscar Salvador,
	Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko,
	Jann Horn, Matthew Brost, Joshua Hahn, Rakie Kim, Byungchul Park,
	Gregory Price, Ying Huang, Alistair Popple, Pedro Falcato,
	Pasha Tatashin, Rik van Riel, Harry Yoo, kvm, linux-s390,
	linux-kernel, linux-fsdevel, linux-mm

Right now we are inconsistent in our use of thp_migration_supported():

static inline bool thp_migration_supported(void)
{
	return IS_ENABLED(CONFIG_ARCH_ENABLE_THP_MIGRATION);
}

And simply having arbitrary and ugly #ifdef
CONFIG_ARCH_ENABLE_THP_MIGRATION blocks in code.

This is exhibited in copy_huge_pmd(), which inserts a large #ifdef
CONFIG_ARCH_ENABLE_THP_MIGRATION block and an if-branch which is difficult
to follow

It's difficult to follow the logic of such a large function and the
non-present PMD logic is clearly separate as it sits in a giant if-branch.

Therefore this patch both separates out the logic and utilises
thp_migration_supported().

Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
---
 mm/huge_memory.c | 109 +++++++++++++++++++++++++----------------------
 1 file changed, 59 insertions(+), 50 deletions(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index e0fe1c3e01c9..755d38f82aec 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1773,6 +1773,62 @@ void touch_pmd(struct vm_area_struct *vma, unsigned long addr,
 		update_mmu_cache_pmd(vma, addr, pmd);
 }
 
+static void copy_huge_non_present_pmd(
+		struct mm_struct *dst_mm, struct mm_struct *src_mm,
+		pmd_t *dst_pmd, pmd_t *src_pmd, unsigned long addr,
+		struct vm_area_struct *dst_vma, struct vm_area_struct *src_vma,
+		pmd_t pmd, pgtable_t pgtable)
+{
+	swp_entry_t entry = pmd_to_swp_entry(pmd);
+	struct folio *src_folio;
+
+	VM_WARN_ON(!is_pmd_non_present_folio_entry(pmd));
+
+	if (is_writable_migration_entry(entry) ||
+	    is_readable_exclusive_migration_entry(entry)) {
+		entry = make_readable_migration_entry(swp_offset(entry));
+		pmd = swp_entry_to_pmd(entry);
+		if (pmd_swp_soft_dirty(*src_pmd))
+			pmd = pmd_swp_mksoft_dirty(pmd);
+		if (pmd_swp_uffd_wp(*src_pmd))
+			pmd = pmd_swp_mkuffd_wp(pmd);
+		set_pmd_at(src_mm, addr, src_pmd, pmd);
+	} else if (is_device_private_entry(entry)) {
+		/*
+		 * For device private entries, since there are no
+		 * read exclusive entries, writable = !readable
+		 */
+		if (is_writable_device_private_entry(entry)) {
+			entry = make_readable_device_private_entry(swp_offset(entry));
+			pmd = swp_entry_to_pmd(entry);
+
+			if (pmd_swp_soft_dirty(*src_pmd))
+				pmd = pmd_swp_mksoft_dirty(pmd);
+			if (pmd_swp_uffd_wp(*src_pmd))
+				pmd = pmd_swp_mkuffd_wp(pmd);
+			set_pmd_at(src_mm, addr, src_pmd, pmd);
+		}
+
+		src_folio = pfn_swap_entry_folio(entry);
+		VM_WARN_ON(!folio_test_large(src_folio));
+
+		folio_get(src_folio);
+		/*
+		 * folio_try_dup_anon_rmap_pmd does not fail for
+		 * device private entries.
+		 */
+		folio_try_dup_anon_rmap_pmd(src_folio, &src_folio->page,
+					    dst_vma, src_vma);
+	}
+
+	add_mm_counter(dst_mm, MM_ANONPAGES, HPAGE_PMD_NR);
+	mm_inc_nr_ptes(dst_mm);
+	pgtable_trans_huge_deposit(dst_mm, dst_pmd, pgtable);
+	if (!userfaultfd_wp(dst_vma))
+		pmd = pmd_swp_clear_uffd_wp(pmd);
+	set_pmd_at(dst_mm, addr, dst_pmd, pmd);
+}
+
 int copy_huge_pmd(struct mm_struct *dst_mm, struct mm_struct *src_mm,
 		  pmd_t *dst_pmd, pmd_t *src_pmd, unsigned long addr,
 		  struct vm_area_struct *dst_vma, struct vm_area_struct *src_vma)
@@ -1818,59 +1874,12 @@ int copy_huge_pmd(struct mm_struct *dst_mm, struct mm_struct *src_mm,
 	ret = -EAGAIN;
 	pmd = *src_pmd;
 
-#ifdef CONFIG_ARCH_ENABLE_THP_MIGRATION
-	if (unlikely(is_swap_pmd(pmd))) {
-		swp_entry_t entry = pmd_to_swp_entry(pmd);
-
-		VM_WARN_ON(!is_pmd_non_present_folio_entry(pmd));
-
-		if (is_writable_migration_entry(entry) ||
-		    is_readable_exclusive_migration_entry(entry)) {
-			entry = make_readable_migration_entry(swp_offset(entry));
-			pmd = swp_entry_to_pmd(entry);
-			if (pmd_swp_soft_dirty(*src_pmd))
-				pmd = pmd_swp_mksoft_dirty(pmd);
-			if (pmd_swp_uffd_wp(*src_pmd))
-				pmd = pmd_swp_mkuffd_wp(pmd);
-			set_pmd_at(src_mm, addr, src_pmd, pmd);
-		} else if (is_device_private_entry(entry)) {
-			/*
-			 * For device private entries, since there are no
-			 * read exclusive entries, writable = !readable
-			 */
-			if (is_writable_device_private_entry(entry)) {
-				entry = make_readable_device_private_entry(swp_offset(entry));
-				pmd = swp_entry_to_pmd(entry);
-
-				if (pmd_swp_soft_dirty(*src_pmd))
-					pmd = pmd_swp_mksoft_dirty(pmd);
-				if (pmd_swp_uffd_wp(*src_pmd))
-					pmd = pmd_swp_mkuffd_wp(pmd);
-				set_pmd_at(src_mm, addr, src_pmd, pmd);
-			}
-
-			src_folio = pfn_swap_entry_folio(entry);
-			VM_WARN_ON(!folio_test_large(src_folio));
-
-			folio_get(src_folio);
-			/*
-			 * folio_try_dup_anon_rmap_pmd does not fail for
-			 * device private entries.
-			 */
-			folio_try_dup_anon_rmap_pmd(src_folio, &src_folio->page,
-							dst_vma, src_vma);
-		}
-
-		add_mm_counter(dst_mm, MM_ANONPAGES, HPAGE_PMD_NR);
-		mm_inc_nr_ptes(dst_mm);
-		pgtable_trans_huge_deposit(dst_mm, dst_pmd, pgtable);
-		if (!userfaultfd_wp(dst_vma))
-			pmd = pmd_swp_clear_uffd_wp(pmd);
-		set_pmd_at(dst_mm, addr, dst_pmd, pmd);
+	if (unlikely(thp_migration_supported() && is_swap_pmd(pmd))) {
+		copy_huge_non_present_pmd(dst_mm, src_mm, dst_pmd, src_pmd, addr,
+					  dst_vma, src_vma, pmd, pgtable);
 		ret = 0;
 		goto out_unlock;
 	}
-#endif
 
 	if (unlikely(!pmd_trans_huge(pmd))) {
 		pte_free(dst_mm, pgtable);
-- 
2.51.0



^ permalink raw reply	[flat|nested] 47+ messages in thread

* [RFC PATCH 09/12] mm/huge_memory: refactor change_huge_pmd() non-present logic
  2025-10-24  7:41 [RFC PATCH 00/12] remove is_swap_[pte, pmd]() + non-swap confusion Lorenzo Stoakes
                   ` (7 preceding siblings ...)
  2025-10-24  7:41 ` [RFC PATCH 08/12] mm/huge_memory: refactor copy_huge_pmd() non-present logic Lorenzo Stoakes
@ 2025-10-24  7:41 ` Lorenzo Stoakes
  2025-10-24 18:41   ` Gregory Price
  2025-10-24  7:41 ` [RFC PATCH 10/12] mm: remove remaining is_swap_pmd() users and is_swap_pmd() Lorenzo Stoakes
                   ` (4 subsequent siblings)
  13 siblings, 1 reply; 47+ messages in thread
From: Lorenzo Stoakes @ 2025-10-24  7:41 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Christian Borntraeger, Janosch Frank, Claudio Imbrenda,
	David Hildenbrand, Alexander Gordeev, Gerald Schaefer,
	Heiko Carstens, Vasily Gorbik, Sven Schnelle, Zi Yan,
	Baolin Wang, Liam R . Howlett, Nico Pache, Ryan Roberts,
	Dev Jain, Barry Song, Lance Yang, Kemeng Shi, Kairui Song,
	Nhat Pham, Baoquan He, Chris Li, Peter Xu, Matthew Wilcox,
	Jason Gunthorpe, Leon Romanovsky, Muchun Song, Oscar Salvador,
	Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko,
	Jann Horn, Matthew Brost, Joshua Hahn, Rakie Kim, Byungchul Park,
	Gregory Price, Ying Huang, Alistair Popple, Pedro Falcato,
	Pasha Tatashin, Rik van Riel, Harry Yoo, kvm, linux-s390,
	linux-kernel, linux-fsdevel, linux-mm

Similar to copy_huge_pmd(), there is a large mass of open-coded logic for
the CONFIG_ARCH_ENABLE_THP_MIGRATION non-present entry case that does not
use thp_migration_supported() consistently.

Resolve this by separating out this logic and introduce
change_non_present_huge_pmd().

Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
---
 mm/huge_memory.c | 72 ++++++++++++++++++++++++++----------------------
 1 file changed, 39 insertions(+), 33 deletions(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 755d38f82aec..fa928ca42b6d 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -2498,6 +2498,42 @@ bool move_huge_pmd(struct vm_area_struct *vma, unsigned long old_addr,
 	return false;
 }
 
+static void change_non_present_huge_pmd(struct mm_struct *mm,
+		unsigned long addr, pmd_t *pmd, bool uffd_wp,
+		bool uffd_wp_resolve)
+{
+	swp_entry_t entry = pmd_to_swp_entry(*pmd);
+	struct folio *folio = pfn_swap_entry_folio(entry);
+	pmd_t newpmd;
+
+	VM_WARN_ON(!is_pmd_non_present_folio_entry(*pmd));
+	if (is_writable_migration_entry(entry)) {
+		/*
+		 * A protection check is difficult so
+		 * just be safe and disable write
+		 */
+		if (folio_test_anon(folio))
+			entry = make_readable_exclusive_migration_entry(swp_offset(entry));
+		else
+			entry = make_readable_migration_entry(swp_offset(entry));
+		newpmd = swp_entry_to_pmd(entry);
+		if (pmd_swp_soft_dirty(*pmd))
+			newpmd = pmd_swp_mksoft_dirty(newpmd);
+	} else if (is_writable_device_private_entry(entry)) {
+		entry = make_readable_device_private_entry(swp_offset(entry));
+		newpmd = swp_entry_to_pmd(entry);
+	} else {
+		newpmd = *pmd;
+	}
+
+	if (uffd_wp)
+		newpmd = pmd_swp_mkuffd_wp(newpmd);
+	else if (uffd_wp_resolve)
+		newpmd = pmd_swp_clear_uffd_wp(newpmd);
+	if (!pmd_same(*pmd, newpmd))
+		set_pmd_at(mm, addr, pmd, newpmd);
+}
+
 /*
  * Returns
  *  - 0 if PMD could not be locked
@@ -2526,41 +2562,11 @@ int change_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma,
 	if (!ptl)
 		return 0;
 
-#ifdef CONFIG_ARCH_ENABLE_THP_MIGRATION
-	if (is_swap_pmd(*pmd)) {
-		swp_entry_t entry = pmd_to_swp_entry(*pmd);
-		struct folio *folio = pfn_swap_entry_folio(entry);
-		pmd_t newpmd;
-
-		VM_WARN_ON(!is_pmd_non_present_folio_entry(*pmd));
-		if (is_writable_migration_entry(entry)) {
-			/*
-			 * A protection check is difficult so
-			 * just be safe and disable write
-			 */
-			if (folio_test_anon(folio))
-				entry = make_readable_exclusive_migration_entry(swp_offset(entry));
-			else
-				entry = make_readable_migration_entry(swp_offset(entry));
-			newpmd = swp_entry_to_pmd(entry);
-			if (pmd_swp_soft_dirty(*pmd))
-				newpmd = pmd_swp_mksoft_dirty(newpmd);
-		} else if (is_writable_device_private_entry(entry)) {
-			entry = make_readable_device_private_entry(swp_offset(entry));
-			newpmd = swp_entry_to_pmd(entry);
-		} else {
-			newpmd = *pmd;
-		}
-
-		if (uffd_wp)
-			newpmd = pmd_swp_mkuffd_wp(newpmd);
-		else if (uffd_wp_resolve)
-			newpmd = pmd_swp_clear_uffd_wp(newpmd);
-		if (!pmd_same(*pmd, newpmd))
-			set_pmd_at(mm, addr, pmd, newpmd);
+	if (thp_migration_supported() && is_swap_pmd(*pmd)) {
+		change_non_present_huge_pmd(mm, addr, pmd, uffd_wp,
+					    uffd_wp_resolve);
 		goto unlock;
 	}
-#endif
 
 	if (prot_numa) {
 		int target_node = NUMA_NO_NODE;
-- 
2.51.0



^ permalink raw reply	[flat|nested] 47+ messages in thread

* [RFC PATCH 10/12] mm: remove remaining is_swap_pmd() users and is_swap_pmd()
  2025-10-24  7:41 [RFC PATCH 00/12] remove is_swap_[pte, pmd]() + non-swap confusion Lorenzo Stoakes
                   ` (8 preceding siblings ...)
  2025-10-24  7:41 ` [RFC PATCH 09/12] mm/huge_memory: refactor change_huge_pmd() " Lorenzo Stoakes
@ 2025-10-24  7:41 ` Lorenzo Stoakes
  2025-10-24  7:41 ` [RFC PATCH 11/12] mm: rename non_swap_entry() to is_non_present_entry() Lorenzo Stoakes
                   ` (3 subsequent siblings)
  13 siblings, 0 replies; 47+ messages in thread
From: Lorenzo Stoakes @ 2025-10-24  7:41 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Christian Borntraeger, Janosch Frank, Claudio Imbrenda,
	David Hildenbrand, Alexander Gordeev, Gerald Schaefer,
	Heiko Carstens, Vasily Gorbik, Sven Schnelle, Zi Yan,
	Baolin Wang, Liam R . Howlett, Nico Pache, Ryan Roberts,
	Dev Jain, Barry Song, Lance Yang, Kemeng Shi, Kairui Song,
	Nhat Pham, Baoquan He, Chris Li, Peter Xu, Matthew Wilcox,
	Jason Gunthorpe, Leon Romanovsky, Muchun Song, Oscar Salvador,
	Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko,
	Jann Horn, Matthew Brost, Joshua Hahn, Rakie Kim, Byungchul Park,
	Gregory Price, Ying Huang, Alistair Popple, Pedro Falcato,
	Pasha Tatashin, Rik van Riel, Harry Yoo, kvm, linux-s390,
	linux-kernel, linux-fsdevel, linux-mm

Update copy_huge_pmd() and change_huge_pmd() to use
is_pmd_non_present_folio_entry() - as this checks for the only valid
non-present huge PMD states.

Also update mm/debug_vm_pgtable.c to explicitly test for a valid
non-present PMD entry (which it was not before, which was incorrect), and
have it test against is_huge_pmd() and is_pmd_non_present_folio_entry()
rather than is_swap_pmd().

With these changes done there are no further users of is_swap_pmd(), so
remove it.

Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
---
 include/linux/huge_mm.h |  9 ---------
 mm/debug_vm_pgtable.c   | 25 +++++++++++++++----------
 mm/huge_memory.c        |  5 +++--
 3 files changed, 18 insertions(+), 21 deletions(-)

diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
index 892cb825dfc7..0c3a002dc10f 100644
--- a/include/linux/huge_mm.h
+++ b/include/linux/huge_mm.h
@@ -484,11 +484,6 @@ void vma_adjust_trans_huge(struct vm_area_struct *vma, unsigned long start,
 spinlock_t *__pmd_trans_huge_lock(pmd_t *pmd, struct vm_area_struct *vma);
 spinlock_t *__pud_trans_huge_lock(pud_t *pud, struct vm_area_struct *vma);
 
-static inline int is_swap_pmd(pmd_t pmd)
-{
-	return !pmd_none(pmd) && !pmd_present(pmd);
-}
-
 /* mmap_lock must be held on entry */
 static inline spinlock_t *pmd_trans_huge_lock(pmd_t *pmd,
 		struct vm_area_struct *vma)
@@ -684,10 +679,6 @@ static inline void vma_adjust_trans_huge(struct vm_area_struct *vma,
 					 struct vm_area_struct *next)
 {
 }
-static inline int is_swap_pmd(pmd_t pmd)
-{
-	return 0;
-}
 static inline spinlock_t *pmd_trans_huge_lock(pmd_t *pmd,
 		struct vm_area_struct *vma)
 {
diff --git a/mm/debug_vm_pgtable.c b/mm/debug_vm_pgtable.c
index d4b2835569ce..5b8b0024a492 100644
--- a/mm/debug_vm_pgtable.c
+++ b/mm/debug_vm_pgtable.c
@@ -74,6 +74,7 @@ struct pgtable_debug_args {
 	unsigned long		fixed_pte_pfn;
 
 	swp_entry_t		swp_entry;
+	swp_entry_t		non_present_swp_entry;
 };
 
 static void __init pte_basic_tests(struct pgtable_debug_args *args, int idx)
@@ -731,7 +732,7 @@ static void __init pmd_soft_dirty_tests(struct pgtable_debug_args *args)
 	WARN_ON(pmd_soft_dirty(pmd_clear_soft_dirty(pmd)));
 }
 
-static void __init pmd_swap_soft_dirty_tests(struct pgtable_debug_args *args)
+static void __init pmd_non_present_soft_dirty_tests(struct pgtable_debug_args *args)
 {
 	pmd_t pmd;
 
@@ -743,15 +744,16 @@ static void __init pmd_swap_soft_dirty_tests(struct pgtable_debug_args *args)
 		return;
 
 	pr_debug("Validating PMD swap soft dirty\n");
-	pmd = swp_entry_to_pmd(args->swp_entry);
-	WARN_ON(!is_swap_pmd(pmd));
+	pmd = swp_entry_to_pmd(args->non_present_swp_entry);
+	WARN_ON(!is_huge_pmd(pmd));
+	WARN_ON(!is_pmd_non_present_folio_entry(pmd));
 
 	WARN_ON(!pmd_swp_soft_dirty(pmd_swp_mksoft_dirty(pmd)));
 	WARN_ON(pmd_swp_soft_dirty(pmd_swp_clear_soft_dirty(pmd)));
 }
 #else  /* !CONFIG_TRANSPARENT_HUGEPAGE */
 static void __init pmd_soft_dirty_tests(struct pgtable_debug_args *args) { }
-static void __init pmd_swap_soft_dirty_tests(struct pgtable_debug_args *args) { }
+static void __init pmd_non_present_soft_dirty_tests(struct pgtable_debug_args *args) { }
 #endif /* CONFIG_TRANSPARENT_HUGEPAGE */
 
 static void __init pte_swap_exclusive_tests(struct pgtable_debug_args *args)
@@ -796,7 +798,7 @@ static void __init pte_swap_tests(struct pgtable_debug_args *args)
 }
 
 #ifdef CONFIG_ARCH_ENABLE_THP_MIGRATION
-static void __init pmd_swap_tests(struct pgtable_debug_args *args)
+static void __init pmd_non_present_tests(struct pgtable_debug_args *args)
 {
 	swp_entry_t arch_entry;
 	pmd_t pmd1, pmd2;
@@ -805,15 +807,16 @@ static void __init pmd_swap_tests(struct pgtable_debug_args *args)
 		return;
 
 	pr_debug("Validating PMD swap\n");
-	pmd1 = swp_entry_to_pmd(args->swp_entry);
-	WARN_ON(!is_swap_pmd(pmd1));
+	pmd1 = swp_entry_to_pmd(args->non_present_swp_entry);
+	WARN_ON(!is_huge_pmd(pmd1));
+	WARN_ON(!is_pmd_non_present_folio_entry(pmd1));
 
 	arch_entry = __pmd_to_swp_entry(pmd1);
 	pmd2 = __swp_entry_to_pmd(arch_entry);
 	WARN_ON(memcmp(&pmd1, &pmd2, sizeof(pmd1)));
 }
 #else  /* !CONFIG_ARCH_ENABLE_THP_MIGRATION */
-static void __init pmd_swap_tests(struct pgtable_debug_args *args) { }
+static void __init pmd_non_present_tests(struct pgtable_debug_args *args) { }
 #endif /* CONFIG_ARCH_ENABLE_THP_MIGRATION */
 
 static void __init swap_migration_tests(struct pgtable_debug_args *args)
@@ -1207,6 +1210,8 @@ static int __init init_args(struct pgtable_debug_args *args)
 	max_swap_offset = swp_offset(pte_to_swp_entry(swp_entry_to_pte(swp_entry(0, ~0UL))));
 	/* Create a swp entry with all possible bits set while still being swap. */
 	args->swp_entry = swp_entry(MAX_SWAPFILES - 1, max_swap_offset);
+	/* Create a non-present migration entry. */
+	args->non_present_swp_entry = make_writable_migration_entry(~0UL);
 
 	/*
 	 * Allocate (huge) pages because some of the tests need to access
@@ -1296,12 +1301,12 @@ static int __init debug_vm_pgtable(void)
 	pte_soft_dirty_tests(&args);
 	pmd_soft_dirty_tests(&args);
 	pte_swap_soft_dirty_tests(&args);
-	pmd_swap_soft_dirty_tests(&args);
+	pmd_non_present_soft_dirty_tests(&args);
 
 	pte_swap_exclusive_tests(&args);
 
 	pte_swap_tests(&args);
-	pmd_swap_tests(&args);
+	pmd_non_present_tests(&args);
 
 	swap_migration_tests(&args);
 
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index fa928ca42b6d..a16da67684b4 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1874,7 +1874,8 @@ int copy_huge_pmd(struct mm_struct *dst_mm, struct mm_struct *src_mm,
 	ret = -EAGAIN;
 	pmd = *src_pmd;
 
-	if (unlikely(thp_migration_supported() && is_swap_pmd(pmd))) {
+	if (unlikely(thp_migration_supported() &&
+		     is_pmd_non_present_folio_entry(pmd))) {
 		copy_huge_non_present_pmd(dst_mm, src_mm, dst_pmd, src_pmd, addr,
 					  dst_vma, src_vma, pmd, pgtable);
 		ret = 0;
@@ -2562,7 +2563,7 @@ int change_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma,
 	if (!ptl)
 		return 0;
 
-	if (thp_migration_supported() && is_swap_pmd(*pmd)) {
+	if (thp_migration_supported() && is_pmd_non_present_folio_entry(*pmd)) {
 		change_non_present_huge_pmd(mm, addr, pmd, uffd_wp,
 					    uffd_wp_resolve);
 		goto unlock;
-- 
2.51.0



^ permalink raw reply	[flat|nested] 47+ messages in thread

* [RFC PATCH 11/12] mm: rename non_swap_entry() to is_non_present_entry()
  2025-10-24  7:41 [RFC PATCH 00/12] remove is_swap_[pte, pmd]() + non-swap confusion Lorenzo Stoakes
                   ` (9 preceding siblings ...)
  2025-10-24  7:41 ` [RFC PATCH 10/12] mm: remove remaining is_swap_pmd() users and is_swap_pmd() Lorenzo Stoakes
@ 2025-10-24  7:41 ` Lorenzo Stoakes
  2025-10-24 19:07   ` Gregory Price
  2025-10-24  7:41 ` [RFC PATCH 12/12] mm: provide is_swap_entry() and use it Lorenzo Stoakes
                   ` (2 subsequent siblings)
  13 siblings, 1 reply; 47+ messages in thread
From: Lorenzo Stoakes @ 2025-10-24  7:41 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Christian Borntraeger, Janosch Frank, Claudio Imbrenda,
	David Hildenbrand, Alexander Gordeev, Gerald Schaefer,
	Heiko Carstens, Vasily Gorbik, Sven Schnelle, Zi Yan,
	Baolin Wang, Liam R . Howlett, Nico Pache, Ryan Roberts,
	Dev Jain, Barry Song, Lance Yang, Kemeng Shi, Kairui Song,
	Nhat Pham, Baoquan He, Chris Li, Peter Xu, Matthew Wilcox,
	Jason Gunthorpe, Leon Romanovsky, Muchun Song, Oscar Salvador,
	Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko,
	Jann Horn, Matthew Brost, Joshua Hahn, Rakie Kim, Byungchul Park,
	Gregory Price, Ying Huang, Alistair Popple, Pedro Falcato,
	Pasha Tatashin, Rik van Riel, Harry Yoo, kvm, linux-s390,
	linux-kernel, linux-fsdevel, linux-mm

Referring to non-swap swap entries is simply confusing. While we store
non-present entries unrelated to swap itself, in swp_entry_t fields, we can
avoid referring to them as 'non-swap' entries.

Instead, refer to them as non-present entries, as opposed to strictly swap
entries. We achieve this by renaming non_swap_entry() to
is_non_present_entry(), which also adds an 'is_' prefix for consistency
with other similar predicate functions.

Also update comments accordingly.

Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
---
 arch/s390/mm/gmap_helpers.c |  2 +-
 arch/s390/mm/pgtable.c      |  2 +-
 fs/proc/task_mmu.c          |  2 +-
 include/linux/swapops.h     | 21 ++++++++++++++++-----
 mm/filemap.c                |  2 +-
 mm/hmm.c                    |  2 +-
 mm/madvise.c                |  4 ++--
 mm/memory.c                 |  8 ++++----
 mm/mincore.c                |  2 +-
 mm/userfaultfd.c            |  2 +-
 10 files changed, 29 insertions(+), 18 deletions(-)

diff --git a/arch/s390/mm/gmap_helpers.c b/arch/s390/mm/gmap_helpers.c
index d4c3c36855e2..2c41276a34c5 100644
--- a/arch/s390/mm/gmap_helpers.c
+++ b/arch/s390/mm/gmap_helpers.c
@@ -28,7 +28,7 @@
  */
 static void ptep_zap_swap_entry(struct mm_struct *mm, swp_entry_t entry)
 {
-	if (!non_swap_entry(entry))
+	if (!is_non_present_entry(entry))
 		dec_mm_counter(mm, MM_SWAPENTS);
 	else if (is_migration_entry(entry))
 		dec_mm_counter(mm, mm_counter(pfn_swap_entry_folio(entry)));
diff --git a/arch/s390/mm/pgtable.c b/arch/s390/mm/pgtable.c
index 0fde20bbc50b..0c795f3c324f 100644
--- a/arch/s390/mm/pgtable.c
+++ b/arch/s390/mm/pgtable.c
@@ -685,7 +685,7 @@ void ptep_unshadow_pte(struct mm_struct *mm, unsigned long saddr, pte_t *ptep)
 
 static void ptep_zap_swap_entry(struct mm_struct *mm, swp_entry_t entry)
 {
-	if (!non_swap_entry(entry))
+	if (!is_non_present_entry(entry))
 		dec_mm_counter(mm, MM_SWAPENTS);
 	else if (is_migration_entry(entry)) {
 		struct folio *folio = pfn_swap_entry_folio(entry);
diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index 1c32a0e2b965..28f30e01e504 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -1022,7 +1022,7 @@ static void smaps_pte_entry(pte_t *pte, unsigned long addr,
 	} else {
 		swp_entry_t swpent = pte_to_swp_entry(ptent);
 
-		if (!non_swap_entry(swpent)) {
+		if (!is_non_present_entry(swpent)) {
 			int mapcount;
 
 			mss->swap += PAGE_SIZE;
diff --git a/include/linux/swapops.h b/include/linux/swapops.h
index 8642e590504a..fb463d75fa90 100644
--- a/include/linux/swapops.h
+++ b/include/linux/swapops.h
@@ -645,7 +645,18 @@ static inline int is_pmd_device_private_entry(pmd_t pmd)
 
 #endif /* CONFIG_ZONE_DEVICE && CONFIG_ARCH_ENABLE_THP_MIGRATION */
 
-static inline int non_swap_entry(swp_entry_t entry)
+/**
+ * is_non_present_entry() - Determine if this is a miscellaneous
+ * non-present entry.
+ * @entry: The entry to examine.
+ *
+ * This function determines whether data encoded in non-present leaf page
+ * tables is a migration entry, device private entry, marker entry, etc. -
+ * that is a non-present entry that is not a swap entry.
+ *
+ * Returns: true if is a non-present entry, otherwise false.
+ */
+static inline bool is_non_present_entry(swp_entry_t entry)
 {
 	return swp_type(entry) >= MAX_SWAPFILES;
 }
@@ -661,9 +672,9 @@ static inline int is_pmd_non_present_folio_entry(pmd_t pmd)
  * @entryp: Output pointer to a swap entry that will be populated upon
  * success.
  *
- * Determines if the PTE describes an entry in swap or swap cache (i.e. is a
- * swap entry and not a non-swap entry), if so it sets @entryp to the swap
- * entry.
+ * Determines if the PTE describes an entry in swap or swap cache (i.e. is
+ * a swap entry and not a non-present entry), if so it sets @entryp to the
+ * swap entry.
  *
  * This should only be used if we do not have any prior knowledge of this
  * PTE's state.
@@ -678,7 +689,7 @@ static inline bool get_pte_swap_entry(pte_t pte, swp_entry_t *entryp)
 		return false;
 
 	*entryp = pte_to_swp_entry(pte);
-	if (non_swap_entry(*entryp))
+	if (is_non_present_entry(*entryp))
 		return false;
 
 	return true;
diff --git a/mm/filemap.c b/mm/filemap.c
index 893ba49808b7..1440e176e124 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -4553,7 +4553,7 @@ static void filemap_cachestat(struct address_space *mapping,
 				swp_entry_t swp = radix_to_swp_entry(folio);
 
 				/* swapin error results in poisoned entry */
-				if (non_swap_entry(swp))
+				if (is_non_present_entry(swp))
 					goto resched;
 
 				/*
diff --git a/mm/hmm.c b/mm/hmm.c
index a56081d67ad6..66e18b28a21d 100644
--- a/mm/hmm.c
+++ b/mm/hmm.c
@@ -274,7 +274,7 @@ static int hmm_vma_handle_pte(struct mm_walk *walk, unsigned long addr,
 		if (!required_fault)
 			goto out;
 
-		if (!non_swap_entry(entry))
+		if (!is_non_present_entry(entry))
 			goto fault;
 
 		if (is_device_private_entry(entry))
diff --git a/mm/madvise.c b/mm/madvise.c
index 578036ef6675..a259dae2b899 100644
--- a/mm/madvise.c
+++ b/mm/madvise.c
@@ -248,7 +248,7 @@ static void shmem_swapin_range(struct vm_area_struct *vma,
 			continue;
 		entry = radix_to_swp_entry(folio);
 		/* There might be swapin error entries in shmem mapping. */
-		if (non_swap_entry(entry))
+		if (is_non_present_entry(entry))
 			continue;
 
 		addr = vma->vm_start +
@@ -690,7 +690,7 @@ static int madvise_free_pte_range(pmd_t *pmd, unsigned long addr,
 			swp_entry_t entry;
 
 			entry = pte_to_swp_entry(ptent);
-			if (!non_swap_entry(entry)) {
+			if (!is_non_present_entry(entry)) {
 				max_nr = (end - addr) / PAGE_SIZE;
 				nr = swap_pte_batch(pte, max_nr, ptent);
 				nr_swap -= nr;
diff --git a/mm/memory.c b/mm/memory.c
index cc163060933f..8968ba0b076f 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -847,7 +847,7 @@ struct page *vm_normal_page_pud(struct vm_area_struct *vma,
  * @ptep: pte pointer into the locked page table mapping the folio page
  * @orig_pte: pte value at @ptep
  *
- * Restore a device-exclusive non-swap entry to an ordinary present pte.
+ * Restore a device-exclusive non-present entry to an ordinary present pte.
  *
  * The folio and the page table must be locked, and MMU notifiers must have
  * been called to invalidate any (exclusive) device mappings.
@@ -931,7 +931,7 @@ copy_nonpresent_pte(struct mm_struct *dst_mm, struct mm_struct *src_mm,
 	struct page *page;
 	swp_entry_t entry = pte_to_swp_entry(orig_pte);
 
-	if (likely(!non_swap_entry(entry))) {
+	if (likely(!is_non_present_entry(entry))) {
 		if (swap_duplicate(entry) < 0)
 			return -EIO;
 
@@ -1739,7 +1739,7 @@ static inline int zap_nonpresent_ptes(struct mmu_gather *tlb,
 		rss[mm_counter(folio)]--;
 		folio_remove_rmap_pte(folio, page, vma);
 		folio_put(folio);
-	} else if (!non_swap_entry(entry)) {
+	} else if (!is_non_present_entry(entry)) {
 		/* Genuine swap entries, hence a private anon pages */
 		if (!should_zap_cows(details))
 			return 1;
@@ -4646,7 +4646,7 @@ vm_fault_t do_swap_page(struct vm_fault *vmf)
 		goto out;
 
 	entry = pte_to_swp_entry(vmf->orig_pte);
-	if (unlikely(non_swap_entry(entry))) {
+	if (unlikely(is_non_present_entry(entry))) {
 		if (is_migration_entry(entry)) {
 			migration_entry_wait(vma->vm_mm, vmf->pmd,
 					     vmf->address);
diff --git a/mm/mincore.c b/mm/mincore.c
index 8ec4719370e1..61531a7cd8b0 100644
--- a/mm/mincore.c
+++ b/mm/mincore.c
@@ -63,7 +63,7 @@ static unsigned char mincore_swap(swp_entry_t entry, bool shmem)
 	 * absent. Page table may contain migration or hwpoison
 	 * entries which are always uptodate.
 	 */
-	if (non_swap_entry(entry))
+	if (is_non_present_entry(entry))
 		return !shmem;
 
 	/*
diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c
index 00122f42718c..04fab82a1119 100644
--- a/mm/userfaultfd.c
+++ b/mm/userfaultfd.c
@@ -1427,7 +1427,7 @@ static long move_pages_ptes(struct mm_struct *mm, pmd_t *dst_pmd, pmd_t *src_pmd
 		struct folio *folio = NULL;
 
 		entry = pte_to_swp_entry(orig_src_pte);
-		if (non_swap_entry(entry)) {
+		if (is_non_present_entry(entry)) {
 			if (is_migration_entry(entry)) {
 				pte_unmap(src_pte);
 				pte_unmap(dst_pte);
-- 
2.51.0



^ permalink raw reply	[flat|nested] 47+ messages in thread

* [RFC PATCH 12/12] mm: provide is_swap_entry() and use it
  2025-10-24  7:41 [RFC PATCH 00/12] remove is_swap_[pte, pmd]() + non-swap confusion Lorenzo Stoakes
                   ` (10 preceding siblings ...)
  2025-10-24  7:41 ` [RFC PATCH 11/12] mm: rename non_swap_entry() to is_non_present_entry() Lorenzo Stoakes
@ 2025-10-24  7:41 ` Lorenzo Stoakes
  2025-10-24 20:05 ` [RFC PATCH 00/12] remove is_swap_[pte, pmd]() + non-swap confusion Yosry Ahmed
  2025-10-27 16:09 ` Jason Gunthorpe
  13 siblings, 0 replies; 47+ messages in thread
From: Lorenzo Stoakes @ 2025-10-24  7:41 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Christian Borntraeger, Janosch Frank, Claudio Imbrenda,
	David Hildenbrand, Alexander Gordeev, Gerald Schaefer,
	Heiko Carstens, Vasily Gorbik, Sven Schnelle, Zi Yan,
	Baolin Wang, Liam R . Howlett, Nico Pache, Ryan Roberts,
	Dev Jain, Barry Song, Lance Yang, Kemeng Shi, Kairui Song,
	Nhat Pham, Baoquan He, Chris Li, Peter Xu, Matthew Wilcox,
	Jason Gunthorpe, Leon Romanovsky, Muchun Song, Oscar Salvador,
	Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko,
	Jann Horn, Matthew Brost, Joshua Hahn, Rakie Kim, Byungchul Park,
	Gregory Price, Ying Huang, Alistair Popple, Pedro Falcato,
	Pasha Tatashin, Rik van Riel, Harry Yoo, kvm, linux-s390,
	linux-kernel, linux-fsdevel, linux-mm

Previously we have been in the insane situation where people check whether
we are in fact dealing with a swap entry by negating non_swap_entry() - so
determining if a swap entry is an entry for swap by checking that it's not
a not swap entry.

This is really rather sub-optimal, so rather than engaging in this dance,
and now we've eliminated confusing is_swap_pte() and is_swap_pmd() helpers,
and renamed non-swap entries to non-present entries, we are well placed to
introduce a new helper.

We therefore introduce is_swap_entry() for this purpose which simply
determines if a swp_entry_t value encodes an actual swap entry, and update
relevant callers to use this instead.

Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
---
 arch/s390/mm/gmap_helpers.c |  2 +-
 arch/s390/mm/pgtable.c      |  2 +-
 fs/proc/task_mmu.c          |  2 +-
 include/linux/swapops.h     | 15 +++++++++++++++
 mm/madvise.c                |  2 +-
 mm/memory.c                 |  4 ++--
 6 files changed, 21 insertions(+), 6 deletions(-)

diff --git a/arch/s390/mm/gmap_helpers.c b/arch/s390/mm/gmap_helpers.c
index 2c41276a34c5..222a26d09cbb 100644
--- a/arch/s390/mm/gmap_helpers.c
+++ b/arch/s390/mm/gmap_helpers.c
@@ -28,7 +28,7 @@
  */
 static void ptep_zap_swap_entry(struct mm_struct *mm, swp_entry_t entry)
 {
-	if (!is_non_present_entry(entry))
+	if (is_swap_entry(entry))
 		dec_mm_counter(mm, MM_SWAPENTS);
 	else if (is_migration_entry(entry))
 		dec_mm_counter(mm, mm_counter(pfn_swap_entry_folio(entry)));
diff --git a/arch/s390/mm/pgtable.c b/arch/s390/mm/pgtable.c
index 0c795f3c324f..a15befcf6a8c 100644
--- a/arch/s390/mm/pgtable.c
+++ b/arch/s390/mm/pgtable.c
@@ -685,7 +685,7 @@ void ptep_unshadow_pte(struct mm_struct *mm, unsigned long saddr, pte_t *ptep)
 
 static void ptep_zap_swap_entry(struct mm_struct *mm, swp_entry_t entry)
 {
-	if (!is_non_present_entry(entry))
+	if (is_swap_entry(entry))
 		dec_mm_counter(mm, MM_SWAPENTS);
 	else if (is_migration_entry(entry)) {
 		struct folio *folio = pfn_swap_entry_folio(entry);
diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index 28f30e01e504..d62fdae57dce 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -1022,7 +1022,7 @@ static void smaps_pte_entry(pte_t *pte, unsigned long addr,
 	} else {
 		swp_entry_t swpent = pte_to_swp_entry(ptent);
 
-		if (!is_non_present_entry(swpent)) {
+		if (is_swap_entry(swpent)) {
 			int mapcount;
 
 			mss->swap += PAGE_SIZE;
diff --git a/include/linux/swapops.h b/include/linux/swapops.h
index fb463d75fa90..c96c31671230 100644
--- a/include/linux/swapops.h
+++ b/include/linux/swapops.h
@@ -661,6 +661,21 @@ static inline bool is_non_present_entry(swp_entry_t entry)
 	return swp_type(entry) >= MAX_SWAPFILES;
 }
 
+/**
+ * is_swap_entry() - Determines if this is a swap entry.
+ * @entry: The entry to examine.
+ *
+ * Determines whether data encoded in non-present leaf page tables is a
+ * swap entry and NOT a migration entry, device private entry, market
+ * entry, etc.
+ *
+ * Returns true if it is a swap entry, otherwise false.
+ */
+static inline bool is_swap_entry(swp_entry_t entry)
+{
+	return !is_non_present_entry(entry);
+}
+
 static inline int is_pmd_non_present_folio_entry(pmd_t pmd)
 {
 	return is_pmd_migration_entry(pmd) || is_pmd_device_private_entry(pmd);
diff --git a/mm/madvise.c b/mm/madvise.c
index a259dae2b899..4bf098986cb4 100644
--- a/mm/madvise.c
+++ b/mm/madvise.c
@@ -690,7 +690,7 @@ static int madvise_free_pte_range(pmd_t *pmd, unsigned long addr,
 			swp_entry_t entry;
 
 			entry = pte_to_swp_entry(ptent);
-			if (!is_non_present_entry(entry)) {
+			if (is_swap_entry(entry)) {
 				max_nr = (end - addr) / PAGE_SIZE;
 				nr = swap_pte_batch(pte, max_nr, ptent);
 				nr_swap -= nr;
diff --git a/mm/memory.c b/mm/memory.c
index 8968ba0b076f..4f4179eb70c0 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -931,7 +931,7 @@ copy_nonpresent_pte(struct mm_struct *dst_mm, struct mm_struct *src_mm,
 	struct page *page;
 	swp_entry_t entry = pte_to_swp_entry(orig_pte);
 
-	if (likely(!is_non_present_entry(entry))) {
+	if (likely(is_swap_entry(entry))) {
 		if (swap_duplicate(entry) < 0)
 			return -EIO;
 
@@ -1739,7 +1739,7 @@ static inline int zap_nonpresent_ptes(struct mmu_gather *tlb,
 		rss[mm_counter(folio)]--;
 		folio_remove_rmap_pte(folio, page, vma);
 		folio_put(folio);
-	} else if (!is_non_present_entry(entry)) {
+	} else if (is_swap_entry(entry)) {
 		/* Genuine swap entries, hence a private anon pages */
 		if (!should_zap_cows(details))
 			return 1;
-- 
2.51.0



^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [RFC PATCH 05/12] fs/proc/task_mmu: refactor pagemap_pmd_range()
  2025-10-24  7:41 ` [RFC PATCH 05/12] fs/proc/task_mmu: refactor pagemap_pmd_range() Lorenzo Stoakes
@ 2025-10-24 17:32   ` Gregory Price
  2025-10-24 18:19     ` Lorenzo Stoakes
  0 siblings, 1 reply; 47+ messages in thread
From: Gregory Price @ 2025-10-24 17:32 UTC (permalink / raw)
  To: Lorenzo Stoakes
  Cc: Andrew Morton, Christian Borntraeger, Janosch Frank,
	Claudio Imbrenda, David Hildenbrand, Alexander Gordeev,
	Gerald Schaefer, Heiko Carstens, Vasily Gorbik, Sven Schnelle,
	Zi Yan, Baolin Wang, Liam R . Howlett, Nico Pache, Ryan Roberts,
	Dev Jain, Barry Song, Lance Yang, Kemeng Shi, Kairui Song,
	Nhat Pham, Baoquan He, Chris Li, Peter Xu, Matthew Wilcox,
	Jason Gunthorpe, Leon Romanovsky, Muchun Song, Oscar Salvador,
	Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko,
	Jann Horn, Matthew Brost, Joshua Hahn, Rakie Kim, Byungchul Park,
	Ying Huang, Alistair Popple, Pedro Falcato, Pasha Tatashin,
	Rik van Riel, Harry Yoo, kvm, linux-s390, linux-kernel,
	linux-fsdevel, linux-mm

On Fri, Oct 24, 2025 at 08:41:21AM +0100, Lorenzo Stoakes wrote:
> Separate out THP logic so we can drop an indentation level and reduce the
> amount of noise in this function.
> 
> We add pagemap_pmd_range_thp() for this purpose.
> 
> While we're here, convert the VM_BUG_ON() to a VM_WARN_ON_ONCE() at the
> same time.
> 
> Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
... >8
> +static int pagemap_pmd_range(pmd_t *pmdp, unsigned long addr, unsigned long end,
> +			     struct mm_walk *walk)
> +{
> +	struct vm_area_struct *vma = walk->vma;
> +	struct pagemapread *pm = walk->private;
> +	spinlock_t *ptl;
> +	pte_t *pte, *orig_pte;
> +	int err = 0;
> +
> +#ifdef CONFIG_TRANSPARENT_HUGEPAGE
> +	ptl = pmd_trans_huge_lock(pmdp, vma);
> +	if (ptl)
> +		return pagemap_pmd_range_thp(pmdp, addr, end, vma, pm, ptl);
> +#endif

Maybe something like...

#ifdef CONFIG_TRANSPARENT_HUGEPAGE
ptl = pmd_trans_huge_lock(pmdp, vma);
if (ptl) {
	err = pagemap_pmd_range_thp(pmdp, addr, end, vma, pm, ptl);
	spin_unlock(ptl);
	return err;
}
#endif

and drop the spin_unlock(ptl) calls from pagemap_pmd_range_thp?

Makes it easier to understand the locking semantics.

Might be worth adding a comment to pagemap_pmd_range_thp that callers
must hold the ptl lock.

~Gregory

P.S. This patch set made my day, the whole non-swap-swap thing has
always broken my brain.  <3


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [RFC PATCH 05/12] fs/proc/task_mmu: refactor pagemap_pmd_range()
  2025-10-24 17:32   ` Gregory Price
@ 2025-10-24 18:19     ` Lorenzo Stoakes
  2025-10-24 19:12       ` Gregory Price
  0 siblings, 1 reply; 47+ messages in thread
From: Lorenzo Stoakes @ 2025-10-24 18:19 UTC (permalink / raw)
  To: Gregory Price
  Cc: Andrew Morton, Christian Borntraeger, Janosch Frank,
	Claudio Imbrenda, David Hildenbrand, Alexander Gordeev,
	Gerald Schaefer, Heiko Carstens, Vasily Gorbik, Sven Schnelle,
	Zi Yan, Baolin Wang, Liam R . Howlett, Nico Pache, Ryan Roberts,
	Dev Jain, Barry Song, Lance Yang, Kemeng Shi, Kairui Song,
	Nhat Pham, Baoquan He, Chris Li, Peter Xu, Matthew Wilcox,
	Jason Gunthorpe, Leon Romanovsky, Muchun Song, Oscar Salvador,
	Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko,
	Jann Horn, Matthew Brost, Joshua Hahn, Rakie Kim, Byungchul Park,
	Ying Huang, Alistair Popple, Pedro Falcato, Pasha Tatashin,
	Rik van Riel, Harry Yoo, kvm, linux-s390, linux-kernel,
	linux-fsdevel, linux-mm

On Fri, Oct 24, 2025 at 01:32:29PM -0400, Gregory Price wrote:
> On Fri, Oct 24, 2025 at 08:41:21AM +0100, Lorenzo Stoakes wrote:
> > Separate out THP logic so we can drop an indentation level and reduce the
> > amount of noise in this function.
> >
> > We add pagemap_pmd_range_thp() for this purpose.
> >
> > While we're here, convert the VM_BUG_ON() to a VM_WARN_ON_ONCE() at the
> > same time.
> >
> > Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
> ... >8
> > +static int pagemap_pmd_range(pmd_t *pmdp, unsigned long addr, unsigned long end,
> > +			     struct mm_walk *walk)
> > +{
> > +	struct vm_area_struct *vma = walk->vma;
> > +	struct pagemapread *pm = walk->private;
> > +	spinlock_t *ptl;
> > +	pte_t *pte, *orig_pte;
> > +	int err = 0;
> > +
> > +#ifdef CONFIG_TRANSPARENT_HUGEPAGE
> > +	ptl = pmd_trans_huge_lock(pmdp, vma);
> > +	if (ptl)
> > +		return pagemap_pmd_range_thp(pmdp, addr, end, vma, pm, ptl);
> > +#endif
>
> Maybe something like...
>
> #ifdef CONFIG_TRANSPARENT_HUGEPAGE
> ptl = pmd_trans_huge_lock(pmdp, vma);
> if (ptl) {
> 	err = pagemap_pmd_range_thp(pmdp, addr, end, vma, pm, ptl);
> 	spin_unlock(ptl);
> 	return err;
> }
> #endif
>
> and drop the spin_unlock(ptl) calls from pagemap_pmd_range_thp?

Ah yeah that's a good idea, will do that on respin!

>
> Makes it easier to understand the locking semantics.

Absolutely.

>
> Might be worth adding a comment to pagemap_pmd_range_thp that callers
> must hold the ptl lock.

Ack will do!

>
> ~Gregory
>
> P.S. This patch set made my day, the whole non-swap-swap thing has
> always broken my brain.  <3

Thanks :) yeah this came out of my advocating _for_ is_swap_pte() on a series,
because hey - if you're going to operate on PTEs based on them being 'xxx', and
you have a predicate 'is_xxx()' it follows that you should only do those
operations if you have applied the predicate right?

But of course we largely don't do that, so we end up with horrible confusion
between a convention that not-none, not-present = swap entry + this predicate.

PLUS on top of that, we have the 'non-swap swap entry' so we have not-none,
not-present = swap, or swap that is not swap but also swap but hey definitely
isn't and and... :)

A next step will be to at least rename swp_entry_t to something else, because
every last remnant of this 'swap entries but not really' needs to be dealt
with...

Anyway this goes a long way to getting there :)

Cheers, Lorenzo


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [RFC PATCH 09/12] mm/huge_memory: refactor change_huge_pmd() non-present logic
  2025-10-24  7:41 ` [RFC PATCH 09/12] mm/huge_memory: refactor change_huge_pmd() " Lorenzo Stoakes
@ 2025-10-24 18:41   ` Gregory Price
  2025-10-24 18:44     ` Lorenzo Stoakes
  0 siblings, 1 reply; 47+ messages in thread
From: Gregory Price @ 2025-10-24 18:41 UTC (permalink / raw)
  To: Lorenzo Stoakes
  Cc: Andrew Morton, Christian Borntraeger, Janosch Frank,
	Claudio Imbrenda, David Hildenbrand, Alexander Gordeev,
	Gerald Schaefer, Heiko Carstens, Vasily Gorbik, Sven Schnelle,
	Zi Yan, Baolin Wang, Liam R . Howlett, Nico Pache, Ryan Roberts,
	Dev Jain, Barry Song, Lance Yang, Kemeng Shi, Kairui Song,
	Nhat Pham, Baoquan He, Chris Li, Peter Xu, Matthew Wilcox,
	Jason Gunthorpe, Leon Romanovsky, Muchun Song, Oscar Salvador,
	Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko,
	Jann Horn, Matthew Brost, Joshua Hahn, Rakie Kim, Byungchul Park,
	Ying Huang, Alistair Popple, Pedro Falcato, Pasha Tatashin,
	Rik van Riel, Harry Yoo, kvm, linux-s390, linux-kernel,
	linux-fsdevel, linux-mm

On Fri, Oct 24, 2025 at 08:41:25AM +0100, Lorenzo Stoakes wrote:
> Similar to copy_huge_pmd(), there is a large mass of open-coded logic for
> the CONFIG_ARCH_ENABLE_THP_MIGRATION non-present entry case that does not
> use thp_migration_supported() consistently.
> 
> Resolve this by separating out this logic and introduce
> change_non_present_huge_pmd().
> 
> Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
> ---
--- >8

> +	if (thp_migration_supported() && is_swap_pmd(*pmd)) {
> +		change_non_present_huge_pmd(mm, addr, pmd, uffd_wp,
> +					    uffd_wp_resolve);

You point out the original code doesn't have thp_migration_supported()

is this a bug? or is it benign and just leads to it failing (nicely)
deeper in the stack?

If it's a bug, maybe this patch should be pulled out ahead of the rest?

~Gregory


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [RFC PATCH 09/12] mm/huge_memory: refactor change_huge_pmd() non-present logic
  2025-10-24 18:41   ` Gregory Price
@ 2025-10-24 18:44     ` Lorenzo Stoakes
  2025-10-24 19:09       ` Gregory Price
  0 siblings, 1 reply; 47+ messages in thread
From: Lorenzo Stoakes @ 2025-10-24 18:44 UTC (permalink / raw)
  To: Gregory Price
  Cc: Andrew Morton, Christian Borntraeger, Janosch Frank,
	Claudio Imbrenda, David Hildenbrand, Alexander Gordeev,
	Gerald Schaefer, Heiko Carstens, Vasily Gorbik, Sven Schnelle,
	Zi Yan, Baolin Wang, Liam R . Howlett, Nico Pache, Ryan Roberts,
	Dev Jain, Barry Song, Lance Yang, Kemeng Shi, Kairui Song,
	Nhat Pham, Baoquan He, Chris Li, Peter Xu, Matthew Wilcox,
	Jason Gunthorpe, Leon Romanovsky, Muchun Song, Oscar Salvador,
	Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko,
	Jann Horn, Matthew Brost, Joshua Hahn, Rakie Kim, Byungchul Park,
	Ying Huang, Alistair Popple, Pedro Falcato, Pasha Tatashin,
	Rik van Riel, Harry Yoo, kvm, linux-s390, linux-kernel,
	linux-fsdevel, linux-mm

On Fri, Oct 24, 2025 at 02:41:02PM -0400, Gregory Price wrote:
> On Fri, Oct 24, 2025 at 08:41:25AM +0100, Lorenzo Stoakes wrote:
> > Similar to copy_huge_pmd(), there is a large mass of open-coded logic for
> > the CONFIG_ARCH_ENABLE_THP_MIGRATION non-present entry case that does not
> > use thp_migration_supported() consistently.
> >
> > Resolve this by separating out this logic and introduce
> > change_non_present_huge_pmd().
> >
> > Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
> > ---
> --- >8
>
> > +	if (thp_migration_supported() && is_swap_pmd(*pmd)) {
> > +		change_non_present_huge_pmd(mm, addr, pmd, uffd_wp,
> > +					    uffd_wp_resolve);
>
> You point out the original code doesn't have thp_migration_supported()
>
> is this a bug? or is it benign and just leads to it failing (nicely)
> deeper in the stack?
>
> If it's a bug, maybe this patch should be pulled out ahead of the rest?

No it's not a bug, I mean it does:

#ifdef CONFIG_ARCH_ENBLE_THP_MIGRATION
if (is_swap_pmd(*pmd)) {
	...
}
#endif

Instead of the much nicer:

if (thp_migration_supported() && is_swap_pmd(*pmd)) {
}

Given:

static inline bool thp_migration_supported(void)
{
	return IS_ENABLED(CONFIG_ARCH_ENABLE_THP_MIGRATION);
}

You can see it's equivalent except we rely on compiler removing dead code when
we use thp_migration_supported() obviously (which is fine)

>
> ~Gregory

Cheers, Lorenzo


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [RFC PATCH 11/12] mm: rename non_swap_entry() to is_non_present_entry()
  2025-10-24  7:41 ` [RFC PATCH 11/12] mm: rename non_swap_entry() to is_non_present_entry() Lorenzo Stoakes
@ 2025-10-24 19:07   ` Gregory Price
  2025-10-24 20:17     ` Lorenzo Stoakes
  0 siblings, 1 reply; 47+ messages in thread
From: Gregory Price @ 2025-10-24 19:07 UTC (permalink / raw)
  To: Lorenzo Stoakes
  Cc: Andrew Morton, Christian Borntraeger, Janosch Frank,
	Claudio Imbrenda, David Hildenbrand, Alexander Gordeev,
	Gerald Schaefer, Heiko Carstens, Vasily Gorbik, Sven Schnelle,
	Zi Yan, Baolin Wang, Liam R . Howlett, Nico Pache, Ryan Roberts,
	Dev Jain, Barry Song, Lance Yang, Kemeng Shi, Kairui Song,
	Nhat Pham, Baoquan He, Chris Li, Peter Xu, Matthew Wilcox,
	Jason Gunthorpe, Leon Romanovsky, Muchun Song, Oscar Salvador,
	Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko,
	Jann Horn, Matthew Brost, Joshua Hahn, Rakie Kim, Byungchul Park,
	Ying Huang, Alistair Popple, Pedro Falcato, Pasha Tatashin,
	Rik van Riel, Harry Yoo, kvm, linux-s390, linux-kernel,
	linux-fsdevel, linux-mm

On Fri, Oct 24, 2025 at 08:41:27AM +0100, Lorenzo Stoakes wrote:
> Referring to non-swap swap entries is simply confusing. While we store
> non-present entries unrelated to swap itself, in swp_entry_t fields, we can
> avoid referring to them as 'non-swap' entries.
> 
--- >8
>  static void ptep_zap_swap_entry(struct mm_struct *mm, swp_entry_t entry)
>  {
> -	if (!non_swap_entry(entry))
> +	if (!is_non_present_entry(entry))
>  		dec_mm_counter(mm, MM_SWAPENTS);

I guess the question I have here is whether it's feasible to invert the
logic to avoid the double-negative not-logic.

Anyway, naming is hard. In general I appreciate the additional clarity,
even if we still have some `!is_non_*` logic sprinkled about.

--- addt'l aside semi-unrelated to your patches

I can see where this is going in the long run, but the name of this
function (ptep_zap_swap_entry) is as frustrating as the check for
non_swap_entry(entry).

may as well call it `ptep_zap_leaf_thingy` if it's handling multiple
special entry types.

but renaming even more functions in strange places outside scope here.

---

~Gregory


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [RFC PATCH 09/12] mm/huge_memory: refactor change_huge_pmd() non-present logic
  2025-10-24 18:44     ` Lorenzo Stoakes
@ 2025-10-24 19:09       ` Gregory Price
  0 siblings, 0 replies; 47+ messages in thread
From: Gregory Price @ 2025-10-24 19:09 UTC (permalink / raw)
  To: Lorenzo Stoakes
  Cc: Andrew Morton, Christian Borntraeger, Janosch Frank,
	Claudio Imbrenda, David Hildenbrand, Alexander Gordeev,
	Gerald Schaefer, Heiko Carstens, Vasily Gorbik, Sven Schnelle,
	Zi Yan, Baolin Wang, Liam R . Howlett, Nico Pache, Ryan Roberts,
	Dev Jain, Barry Song, Lance Yang, Kemeng Shi, Kairui Song,
	Nhat Pham, Baoquan He, Chris Li, Peter Xu, Matthew Wilcox,
	Jason Gunthorpe, Leon Romanovsky, Muchun Song, Oscar Salvador,
	Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko,
	Jann Horn, Matthew Brost, Joshua Hahn, Rakie Kim, Byungchul Park,
	Ying Huang, Alistair Popple, Pedro Falcato, Pasha Tatashin,
	Rik van Riel, Harry Yoo, kvm, linux-s390, linux-kernel,
	linux-fsdevel, linux-mm

On Fri, Oct 24, 2025 at 07:44:41PM +0100, Lorenzo Stoakes wrote:
> On Fri, Oct 24, 2025 at 02:41:02PM -0400, Gregory Price wrote:
> 
> You can see it's equivalent except we rely on compiler removing dead code when
> we use thp_migration_supported() obviously (which is fine)
> 

derp - disregard.  End of the day friday is probably not the time to
be doing core patch reviews :P.

Cheers,
~Gregory


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [RFC PATCH 05/12] fs/proc/task_mmu: refactor pagemap_pmd_range()
  2025-10-24 18:19     ` Lorenzo Stoakes
@ 2025-10-24 19:12       ` Gregory Price
  2025-10-24 20:15         ` Lorenzo Stoakes
  0 siblings, 1 reply; 47+ messages in thread
From: Gregory Price @ 2025-10-24 19:12 UTC (permalink / raw)
  To: Lorenzo Stoakes
  Cc: Andrew Morton, Christian Borntraeger, Janosch Frank,
	Claudio Imbrenda, David Hildenbrand, Alexander Gordeev,
	Gerald Schaefer, Heiko Carstens, Vasily Gorbik, Sven Schnelle,
	Zi Yan, Baolin Wang, Liam R . Howlett, Nico Pache, Ryan Roberts,
	Dev Jain, Barry Song, Lance Yang, Kemeng Shi, Kairui Song,
	Nhat Pham, Baoquan He, Chris Li, Peter Xu, Matthew Wilcox,
	Jason Gunthorpe, Leon Romanovsky, Muchun Song, Oscar Salvador,
	Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko,
	Jann Horn, Matthew Brost, Joshua Hahn, Rakie Kim, Byungchul Park,
	Ying Huang, Alistair Popple, Pedro Falcato, Pasha Tatashin,
	Rik van Riel, Harry Yoo, kvm, linux-s390, linux-kernel,
	linux-fsdevel, linux-mm

On Fri, Oct 24, 2025 at 07:19:11PM +0100, Lorenzo Stoakes wrote:
> On Fri, Oct 24, 2025 at 01:32:29PM -0400, Gregory Price wrote:
> 
> A next step will be to at least rename swp_entry_t to something else, because
> every last remnant of this 'swap entries but not really' needs to be dealt
> with...
>

hah, was just complaining about this on the other patch.

ptleaf_entry_t?

:shrug:

keep fighting the good fight
~Gregory


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [RFC PATCH 00/12] remove is_swap_[pte, pmd]() + non-swap confusion
  2025-10-24  7:41 [RFC PATCH 00/12] remove is_swap_[pte, pmd]() + non-swap confusion Lorenzo Stoakes
                   ` (11 preceding siblings ...)
  2025-10-24  7:41 ` [RFC PATCH 12/12] mm: provide is_swap_entry() and use it Lorenzo Stoakes
@ 2025-10-24 20:05 ` Yosry Ahmed
  2025-10-24 20:14   ` Lorenzo Stoakes
  2025-10-27 16:09 ` Jason Gunthorpe
  13 siblings, 1 reply; 47+ messages in thread
From: Yosry Ahmed @ 2025-10-24 20:05 UTC (permalink / raw)
  To: Lorenzo Stoakes
  Cc: Andrew Morton, Christian Borntraeger, Janosch Frank,
	Claudio Imbrenda, David Hildenbrand, Alexander Gordeev,
	Gerald Schaefer, Heiko Carstens, Vasily Gorbik, Sven Schnelle,
	Zi Yan, Baolin Wang, Liam R . Howlett, Nico Pache, Ryan Roberts,
	Dev Jain, Barry Song, Lance Yang, Kemeng Shi, Kairui Song,
	Nhat Pham, Baoquan He, Chris Li, Peter Xu, Matthew Wilcox,
	Jason Gunthorpe, Leon Romanovsky, Muchun Song, Oscar Salvador,
	Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko,
	Jann Horn, Matthew Brost, Joshua Hahn, Rakie Kim, Byungchul Park,
	Gregory Price, Ying Huang, Alistair Popple, Pedro Falcato,
	Pasha Tatashin, Rik van Riel, Harry Yoo, kvm, linux-s390,
	linux-kernel, linux-fsdevel, linux-mm

On Fri, Oct 24, 2025 at 08:41:16AM +0100, Lorenzo Stoakes wrote:
> There's an established convention in the kernel that we treat leaf page
> tables (so far at the PTE, PMD level) as containing 'swap entries' should
> they be neither empty (i.e. p**_none() evaluating true) nor present
> (i.e. p**_present() evaluating true).
> 
> However, at the same time we also have helper predicates - is_swap_pte(),
> is_swap_pmd() - which are inconsistently used.
> 
> This is problematic, as it is logical to assume that should somebody wish
> to operate upon a page table swap entry they should first check to see if
> it is in fact one.
> 
> It also implies that perhaps, in future, we might introduce a non-present,
> none page table entry that is not a swap entry.
> 
> This series resolves this issue by systematically eliminating all use of
> the is_swap_pte() and is swap_pmd() predicates so we retain only the
> convention that should a leaf page table entry be neither none nor present
> it is a swap entry.
> 
> We also have the further issue that 'swap entry' is unfortunately a really
> rather overloaded term and in fact refers to both entries for swap and for
> other information such as migration entries, page table markers, and device
> private entries.
> 
> We therefore have the rather 'unique' concept of a 'non-swap' swap entry.
> 
> This is deeply confusing, so this series goes further and eliminates the
> non_swap_entry() predicate, replacing it with is_non_present_entry() - with
> an eye to a new convention of referring to these non-swap 'swap entries' as
> non-present.

I just wanted to say THANK YOU for doing this. It is indeed a very
annoying and confusing convention, and I wanted to do something about it
in the past but never got around to it..

> 
> It also introduces the is_swap_entry() predicate to explicitly and
> logically refer to actual 'true' swap entries, improving code readibility,
> avoiding the hideous convention of:
> 
> 	if (!non_swap_entry(entry)) {
> 		...
> 	}
> 
> As part of these changes we also introduce a few other new predicates:
> 
> * pte_to_swp_entry_or_zero() - allows for convenient conversion from a PTE
>   to a swap entry if present, or an empty swap entry if none. This is
>   useful as many swap entry conversions are simply checking for flags for
>   which this suffices.
> 
> * get_pte_swap_entry() - Retrieves a PTE swap entry if it truly is a swap
>   entry (i.e. not a non-present entry), returning true if so, otherwise
>   returns false. This simplifies a lot of logic that previously open-coded
>   this.
> 
> * is_huge_pmd() - Determines if a PMD contains either a present transparent
>   huge page entry or a huge non-present entry. This again simplifies a lot
>   of logic that simply open-coded this.
> 
> REVIEWERS NOTE:
> 
> This series applies against mm-unstable as there are currently conflicts
> with mm-new. Should the series receive community assent I will resolve
> these at the point the RFC tag is removed.
> 
> I also intend to use this as a foundation for further work to add higher
> order page table markers.
> 
> Lorenzo Stoakes (12):
>   mm: introduce and use pte_to_swp_entry_or_zero()
>   mm: avoid unnecessary uses of is_swap_pte()
>   mm: introduce get_pte_swap_entry() and use it
>   mm: use get_pte_swap_entry() in debug pgtable + remove is_swap_pte()
>   fs/proc/task_mmu: refactor pagemap_pmd_range()
>   mm: avoid unnecessary use of is_swap_pmd()
>   mm: introduce is_huge_pmd() and use where appropriate
>   mm/huge_memory: refactor copy_huge_pmd() non-present logic
>   mm/huge_memory: refactor change_huge_pmd() non-present logic
>   mm: remove remaining is_swap_pmd() users and is_swap_pmd()
>   mm: rename non_swap_entry() to is_non_present_entry()
>   mm: provide is_swap_entry() and use it
> 
>  arch/s390/mm/gmap_helpers.c   |   2 +-
>  arch/s390/mm/pgtable.c        |   2 +-
>  fs/proc/task_mmu.c            | 214 ++++++++++++++++++++--------------
>  include/linux/huge_mm.h       |  49 +++++---
>  include/linux/swapops.h       |  99 ++++++++++++++--
>  include/linux/userfaultfd_k.h |  16 +--
>  mm/debug_vm_pgtable.c         |  43 ++++---
>  mm/filemap.c                  |   2 +-
>  mm/hmm.c                      |   2 +-
>  mm/huge_memory.c              | 189 ++++++++++++++++--------------
>  mm/hugetlb.c                  |   6 +-
>  mm/internal.h                 |  12 +-
>  mm/khugepaged.c               |  29 ++---
>  mm/madvise.c                  |  14 +--
>  mm/memory.c                   |  62 +++++-----
>  mm/migrate.c                  |   2 +-
>  mm/mincore.c                  |   2 +-
>  mm/mprotect.c                 |  45 ++++---
>  mm/mremap.c                   |   9 +-
>  mm/page_table_check.c         |  25 ++--
>  mm/page_vma_mapped.c          |  30 +++--
>  mm/swap_state.c               |   5 +-
>  mm/swapfile.c                 |   3 +-
>  mm/userfaultfd.c              |   2 +-
>  24 files changed, 511 insertions(+), 353 deletions(-)
> 
> --
> 2.51.0


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [RFC PATCH 00/12] remove is_swap_[pte, pmd]() + non-swap confusion
  2025-10-24 20:05 ` [RFC PATCH 00/12] remove is_swap_[pte, pmd]() + non-swap confusion Yosry Ahmed
@ 2025-10-24 20:14   ` Lorenzo Stoakes
  0 siblings, 0 replies; 47+ messages in thread
From: Lorenzo Stoakes @ 2025-10-24 20:14 UTC (permalink / raw)
  To: Yosry Ahmed
  Cc: Andrew Morton, Christian Borntraeger, Janosch Frank,
	Claudio Imbrenda, David Hildenbrand, Alexander Gordeev,
	Gerald Schaefer, Heiko Carstens, Vasily Gorbik, Sven Schnelle,
	Zi Yan, Baolin Wang, Liam R . Howlett, Nico Pache, Ryan Roberts,
	Dev Jain, Barry Song, Lance Yang, Kemeng Shi, Kairui Song,
	Nhat Pham, Baoquan He, Chris Li, Peter Xu, Matthew Wilcox,
	Jason Gunthorpe, Leon Romanovsky, Muchun Song, Oscar Salvador,
	Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko,
	Jann Horn, Matthew Brost, Joshua Hahn, Rakie Kim, Byungchul Park,
	Gregory Price, Ying Huang, Alistair Popple, Pedro Falcato,
	Pasha Tatashin, Rik van Riel, Harry Yoo, kvm, linux-s390,
	linux-kernel, linux-fsdevel, linux-mm

On Fri, Oct 24, 2025 at 08:05:33PM +0000, Yosry Ahmed wrote:
> On Fri, Oct 24, 2025 at 08:41:16AM +0100, Lorenzo Stoakes wrote:
> > There's an established convention in the kernel that we treat leaf page
> > tables (so far at the PTE, PMD level) as containing 'swap entries' should
> > they be neither empty (i.e. p**_none() evaluating true) nor present
> > (i.e. p**_present() evaluating true).
> >
> > However, at the same time we also have helper predicates - is_swap_pte(),
> > is_swap_pmd() - which are inconsistently used.
> >
> > This is problematic, as it is logical to assume that should somebody wish
> > to operate upon a page table swap entry they should first check to see if
> > it is in fact one.
> >
> > It also implies that perhaps, in future, we might introduce a non-present,
> > none page table entry that is not a swap entry.
> >
> > This series resolves this issue by systematically eliminating all use of
> > the is_swap_pte() and is swap_pmd() predicates so we retain only the
> > convention that should a leaf page table entry be neither none nor present
> > it is a swap entry.
> >
> > We also have the further issue that 'swap entry' is unfortunately a really
> > rather overloaded term and in fact refers to both entries for swap and for
> > other information such as migration entries, page table markers, and device
> > private entries.
> >
> > We therefore have the rather 'unique' concept of a 'non-swap' swap entry.
> >
> > This is deeply confusing, so this series goes further and eliminates the
> > non_swap_entry() predicate, replacing it with is_non_present_entry() - with
> > an eye to a new convention of referring to these non-swap 'swap entries' as
> > non-present.
>
> I just wanted to say THANK YOU for doing this. It is indeed a very
> annoying and confusing convention, and I wanted to do something about it
> in the past but never got around to it..

:) that's very kind of you to say thanks! I was motivated by pure irritation at
this situation after David and I discussed it extensively.

I was initially thinking we should consistently _use_ the predicate and was
arguing for it on review, but David pointed out the convention is quite the
opposite and it became apparent to me th the predicates were the issue.

Of course I have encountered this non-swap swap entry madness as all of us
in mm have, but fixing that ends up being a natural extension to resolving
the is_swap_xxx() stuff and a big relief to deal with also.

There's more work to be done but this addresses some of the more proximate
issues! :)

Cheers, Lorenzo


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [RFC PATCH 05/12] fs/proc/task_mmu: refactor pagemap_pmd_range()
  2025-10-24 19:12       ` Gregory Price
@ 2025-10-24 20:15         ` Lorenzo Stoakes
  2025-10-24 20:37           ` Gregory Price
  0 siblings, 1 reply; 47+ messages in thread
From: Lorenzo Stoakes @ 2025-10-24 20:15 UTC (permalink / raw)
  To: Gregory Price
  Cc: Andrew Morton, Christian Borntraeger, Janosch Frank,
	Claudio Imbrenda, David Hildenbrand, Alexander Gordeev,
	Gerald Schaefer, Heiko Carstens, Vasily Gorbik, Sven Schnelle,
	Zi Yan, Baolin Wang, Liam R . Howlett, Nico Pache, Ryan Roberts,
	Dev Jain, Barry Song, Lance Yang, Kemeng Shi, Kairui Song,
	Nhat Pham, Baoquan He, Chris Li, Peter Xu, Matthew Wilcox,
	Jason Gunthorpe, Leon Romanovsky, Muchun Song, Oscar Salvador,
	Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko,
	Jann Horn, Matthew Brost, Joshua Hahn, Rakie Kim, Byungchul Park,
	Ying Huang, Alistair Popple, Pedro Falcato, Pasha Tatashin,
	Rik van Riel, Harry Yoo, kvm, linux-s390, linux-kernel,
	linux-fsdevel, linux-mm

On Fri, Oct 24, 2025 at 03:12:08PM -0400, Gregory Price wrote:
> On Fri, Oct 24, 2025 at 07:19:11PM +0100, Lorenzo Stoakes wrote:
> > On Fri, Oct 24, 2025 at 01:32:29PM -0400, Gregory Price wrote:
> >
> > A next step will be to at least rename swp_entry_t to something else, because
> > every last remnant of this 'swap entries but not really' needs to be dealt
> > with...
> >
>
> hah, was just complaining about this on the other patch.
>
> ptleaf_entry_t?

Well you kinda want to differentiate it from a normal present page table leaf,
but I really like 'non-present entry' as a description for (what were) non-swap
entries so that's out.

So maybe actually that isn't too bad of an idea...

Could also be

nonpresent_or_swap_t but that's kinda icky...

Naming is hard :)

>
> :shrug:
>
> keep fighting the good fight
> ~Gregory

Always sir! ;)

Cheers, Lorenzo


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [RFC PATCH 11/12] mm: rename non_swap_entry() to is_non_present_entry()
  2025-10-24 19:07   ` Gregory Price
@ 2025-10-24 20:17     ` Lorenzo Stoakes
  0 siblings, 0 replies; 47+ messages in thread
From: Lorenzo Stoakes @ 2025-10-24 20:17 UTC (permalink / raw)
  To: Gregory Price
  Cc: Andrew Morton, Christian Borntraeger, Janosch Frank,
	Claudio Imbrenda, David Hildenbrand, Alexander Gordeev,
	Gerald Schaefer, Heiko Carstens, Vasily Gorbik, Sven Schnelle,
	Zi Yan, Baolin Wang, Liam R . Howlett, Nico Pache, Ryan Roberts,
	Dev Jain, Barry Song, Lance Yang, Kemeng Shi, Kairui Song,
	Nhat Pham, Baoquan He, Chris Li, Peter Xu, Matthew Wilcox,
	Jason Gunthorpe, Leon Romanovsky, Muchun Song, Oscar Salvador,
	Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko,
	Jann Horn, Matthew Brost, Joshua Hahn, Rakie Kim, Byungchul Park,
	Ying Huang, Alistair Popple, Pedro Falcato, Pasha Tatashin,
	Rik van Riel, Harry Yoo, kvm, linux-s390, linux-kernel,
	linux-fsdevel, linux-mm

On Fri, Oct 24, 2025 at 03:07:40PM -0400, Gregory Price wrote:
> On Fri, Oct 24, 2025 at 08:41:27AM +0100, Lorenzo Stoakes wrote:
> > Referring to non-swap swap entries is simply confusing. While we store
> > non-present entries unrelated to swap itself, in swp_entry_t fields, we can
> > avoid referring to them as 'non-swap' entries.
> >
> --- >8
> >  static void ptep_zap_swap_entry(struct mm_struct *mm, swp_entry_t entry)
> >  {
> > -	if (!non_swap_entry(entry))
> > +	if (!is_non_present_entry(entry))
> >  		dec_mm_counter(mm, MM_SWAPENTS);
>
> I guess the question I have here is whether it's feasible to invert the
> logic to avoid the double-negative not-logic.

Ah well, see the next patch :)

>
> Anyway, naming is hard. In general I appreciate the additional clarity,
> even if we still have some `!is_non_*` logic sprinkled about.

Again this is a prelude to 12/12 :P

>
> --- addt'l aside semi-unrelated to your patches
>
> I can see where this is going in the long run, but the name of this
> function (ptep_zap_swap_entry) is as frustrating as the check for
> non_swap_entry(entry).
>
> may as well call it `ptep_zap_leaf_thingy` if it's handling multiple
> special entry types.
>
> but renaming even more functions in strange places outside scope here.

Yeah there's more to do for sure, I'll add this to the list... :)

>
> ---
>
> ~Gregory

Cheers, Lorenzo


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [RFC PATCH 05/12] fs/proc/task_mmu: refactor pagemap_pmd_range()
  2025-10-24 20:15         ` Lorenzo Stoakes
@ 2025-10-24 20:37           ` Gregory Price
  2025-10-27 15:26             ` Lorenzo Stoakes
  2025-10-27 16:11             ` Jason Gunthorpe
  0 siblings, 2 replies; 47+ messages in thread
From: Gregory Price @ 2025-10-24 20:37 UTC (permalink / raw)
  To: Lorenzo Stoakes
  Cc: Andrew Morton, Christian Borntraeger, Janosch Frank,
	Claudio Imbrenda, David Hildenbrand, Alexander Gordeev,
	Gerald Schaefer, Heiko Carstens, Vasily Gorbik, Sven Schnelle,
	Zi Yan, Baolin Wang, Liam R . Howlett, Nico Pache, Ryan Roberts,
	Dev Jain, Barry Song, Lance Yang, Kemeng Shi, Kairui Song,
	Nhat Pham, Baoquan He, Chris Li, Peter Xu, Matthew Wilcox,
	Jason Gunthorpe, Leon Romanovsky, Muchun Song, Oscar Salvador,
	Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko,
	Jann Horn, Matthew Brost, Joshua Hahn, Rakie Kim, Byungchul Park,
	Ying Huang, Alistair Popple, Pedro Falcato, Pasha Tatashin,
	Rik van Riel, Harry Yoo, kvm, linux-s390, linux-kernel,
	linux-fsdevel, linux-mm

On Fri, Oct 24, 2025 at 09:15:59PM +0100, Lorenzo Stoakes wrote:
> On Fri, Oct 24, 2025 at 03:12:08PM -0400, Gregory Price wrote:
> 
> So maybe actually that isn't too bad of an idea...
> 
> Could also be
> 
> nonpresent_or_swap_t but that's kinda icky...

clearly we need:

union {
	swp_entry_t swap;
	nonpresent_entry_t np;
	pony_entry_t pony;
	plum_emtry_t beer;
} leaf_entry_t;

with

leaf_type whats_that_pte(leaf_entry_t);

with 20 more new functions about how to manage leaf_entries ;]

no not seriously, please have a good weekend!

and thanks again for doing this!
~Gregory


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [RFC PATCH 05/12] fs/proc/task_mmu: refactor pagemap_pmd_range()
  2025-10-24 20:37           ` Gregory Price
@ 2025-10-27 15:26             ` Lorenzo Stoakes
  2025-10-27 16:11             ` Jason Gunthorpe
  1 sibling, 0 replies; 47+ messages in thread
From: Lorenzo Stoakes @ 2025-10-27 15:26 UTC (permalink / raw)
  To: Gregory Price
  Cc: Andrew Morton, Christian Borntraeger, Janosch Frank,
	Claudio Imbrenda, David Hildenbrand, Alexander Gordeev,
	Gerald Schaefer, Heiko Carstens, Vasily Gorbik, Sven Schnelle,
	Zi Yan, Baolin Wang, Liam R . Howlett, Nico Pache, Ryan Roberts,
	Dev Jain, Barry Song, Lance Yang, Kemeng Shi, Kairui Song,
	Nhat Pham, Baoquan He, Chris Li, Peter Xu, Matthew Wilcox,
	Jason Gunthorpe, Leon Romanovsky, Muchun Song, Oscar Salvador,
	Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko,
	Jann Horn, Matthew Brost, Joshua Hahn, Rakie Kim, Byungchul Park,
	Ying Huang, Alistair Popple, Pedro Falcato, Pasha Tatashin,
	Rik van Riel, Harry Yoo, kvm, linux-s390, linux-kernel,
	linux-fsdevel, linux-mm

On Fri, Oct 24, 2025 at 04:37:18PM -0400, Gregory Price wrote:
> On Fri, Oct 24, 2025 at 09:15:59PM +0100, Lorenzo Stoakes wrote:
> > On Fri, Oct 24, 2025 at 03:12:08PM -0400, Gregory Price wrote:
> >
> > So maybe actually that isn't too bad of an idea...
> >
> > Could also be
> >
> > nonpresent_or_swap_t but that's kinda icky...
>
> clearly we need:
>
> union {
> 	swp_entry_t swap;
> 	nonpresent_entry_t np;
> 	pony_entry_t pony;
> 	plum_emtry_t beer;
> } leaf_entry_t;
>
> with
>
> leaf_type whats_that_pte(leaf_entry_t);
>
> with 20 more new functions about how to manage leaf_entries ;]

:))))

Hm well the union is an amusing thought but more seriously leaf_entry_t
seems pretty good name-wise actually :)

>
> no not seriously, please have a good weekend!

Thanks :) Hope you had a good one too!

>
> and thanks again for doing this!

Thanks again :)

> ~Gregory

Cheers, Lorenzo


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [RFC PATCH 00/12] remove is_swap_[pte, pmd]() + non-swap confusion
  2025-10-24  7:41 [RFC PATCH 00/12] remove is_swap_[pte, pmd]() + non-swap confusion Lorenzo Stoakes
                   ` (12 preceding siblings ...)
  2025-10-24 20:05 ` [RFC PATCH 00/12] remove is_swap_[pte, pmd]() + non-swap confusion Yosry Ahmed
@ 2025-10-27 16:09 ` Jason Gunthorpe
  2025-10-27 17:33   ` Lorenzo Stoakes
  2025-10-27 23:32   ` Gregory Price
  13 siblings, 2 replies; 47+ messages in thread
From: Jason Gunthorpe @ 2025-10-27 16:09 UTC (permalink / raw)
  To: Lorenzo Stoakes
  Cc: Andrew Morton, Christian Borntraeger, Janosch Frank,
	Claudio Imbrenda, David Hildenbrand, Alexander Gordeev,
	Gerald Schaefer, Heiko Carstens, Vasily Gorbik, Sven Schnelle,
	Zi Yan, Baolin Wang, Liam R . Howlett, Nico Pache, Ryan Roberts,
	Dev Jain, Barry Song, Lance Yang, Kemeng Shi, Kairui Song,
	Nhat Pham, Baoquan He, Chris Li, Peter Xu, Matthew Wilcox,
	Leon Romanovsky, Muchun Song, Oscar Salvador, Vlastimil Babka,
	Mike Rapoport, Suren Baghdasaryan, Michal Hocko, Jann Horn,
	Matthew Brost, Joshua Hahn, Rakie Kim, Byungchul Park,
	Gregory Price, Ying Huang, Alistair Popple, Pedro Falcato,
	Pasha Tatashin, Rik van Riel, Harry Yoo, kvm, linux-s390,
	linux-kernel, linux-fsdevel, linux-mm

On Fri, Oct 24, 2025 at 08:41:16AM +0100, Lorenzo Stoakes wrote:
> There's an established convention in the kernel that we treat leaf page
> tables (so far at the PTE, PMD level) as containing 'swap entries' should
> they be neither empty (i.e. p**_none() evaluating true) nor present
> (i.e. p**_present() evaluating true).

I have to say I've never liked the none-vs-present naming either.

> This is deeply confusing, so this series goes further and eliminates the
> non_swap_entry() predicate, replacing it with is_non_present_entry() - with
> an eye to a new convention of referring to these non-swap 'swap entries' as
> non-present.

I'm not keen on is_non_present_entry(), it seems confusing again.

It looks like we are stuck with swp_entry_t as the being the handle
for a non-present pte. Oh well, not a great name, but fine..

So we think of that swp_entry_t having multiple types: swap, migration,
device private, etc, etc

Then I'd think the general pattern should be to get a swp_entry_t:

    if (pte_present(pte))
        return;
    swpent = pte_to_swp_entry(pte);

And then evaluate the type:

    if (swpent_is_swap()) {
    }


If you keep the naming as "swp_entry" indicates the multi-type value,
then "swap" can mean a swp_entry which is used by the swap subsystem.

That suggests functions like this:

swpent_is_swap()
swpent_is_migration()
..

and your higher level helpers like:

/* True if the pte is a swpent_is_swap() */
static inline bool swpent_get_swap_pte(pte_t pte, swp_entry_t *entryp)
{
   if (pte_present(pte))
        return false;
   *swpent = pte_to_swp_entry(pte);
   return swpent_is_swap(*swpent);
}

I also think it will be more readable to keep all these things under a
swpent namespace instead of using unstructured english names.

> * pte_to_swp_entry_or_zero() - allows for convenient conversion from a PTE
>   to a swap entry if present, or an empty swap entry if none. This is
>   useful as many swap entry conversions are simply checking for flags for
>   which this suffices.

I'd expect a safe function should be more like

   *swpent = pte_to_swp_entry_safe(pte);
   return swpent_is_swap(*swpent);

Where "safe" means that if the PTE is None or Present then
swpent_is_XX() == false. Ie it returns a 0 swpent and 0 swpent is
always nothing.

> * get_pte_swap_entry() - Retrieves a PTE swap entry if it truly is a swap
>   entry (i.e. not a non-present entry), returning true if so, otherwise
>   returns false. This simplifies a lot of logic that previously open-coded
>   this.

Like this is still a tortured function:

+static inline bool get_pte_swap_entry(pte_t pte, swp_entry_t *entryp)
+{
+       if (pte_present(pte))
+               return false;
+       if (pte_none(pte))
+               return false;
+
+       *entryp = pte_to_swp_entry(pte);
+       if (non_swap_entry(*entryp))
+               return false;
+
+       return true;
+}
+

static inline bool get_pte_swap_entry(pte_t pte, swp_entry_t *entryp)
{
   return swpent_is_swap(*swpent = pte_to_swp_entry_safe(pte));
}

Maybe it doesn't even need an inline at that point?

> * is_huge_pmd() - Determines if a PMD contains either a present transparent
>   huge page entry or a huge non-present entry. This again simplifies a lot
>   of logic that simply open-coded this.

is_huge_or_swpent_pmd() would be nicer, IMHO. I think it is surprising
when any of these APIs accept swap entries without being explicit

Jason


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [RFC PATCH 05/12] fs/proc/task_mmu: refactor pagemap_pmd_range()
  2025-10-24 20:37           ` Gregory Price
  2025-10-27 15:26             ` Lorenzo Stoakes
@ 2025-10-27 16:11             ` Jason Gunthorpe
  2025-10-27 16:15               ` David Hildenbrand
  2025-10-27 16:26               ` Lorenzo Stoakes
  1 sibling, 2 replies; 47+ messages in thread
From: Jason Gunthorpe @ 2025-10-27 16:11 UTC (permalink / raw)
  To: Gregory Price
  Cc: Lorenzo Stoakes, Andrew Morton, Christian Borntraeger,
	Janosch Frank, Claudio Imbrenda, David Hildenbrand,
	Alexander Gordeev, Gerald Schaefer, Heiko Carstens,
	Vasily Gorbik, Sven Schnelle, Zi Yan, Baolin Wang,
	Liam R . Howlett, Nico Pache, Ryan Roberts, Dev Jain, Barry Song,
	Lance Yang, Kemeng Shi, Kairui Song, Nhat Pham, Baoquan He,
	Chris Li, Peter Xu, Matthew Wilcox, Leon Romanovsky, Muchun Song,
	Oscar Salvador, Vlastimil Babka, Mike Rapoport,
	Suren Baghdasaryan, Michal Hocko, Jann Horn, Matthew Brost,
	Joshua Hahn, Rakie Kim, Byungchul Park, Ying Huang,
	Alistair Popple, Pedro Falcato, Pasha Tatashin, Rik van Riel,
	Harry Yoo, kvm, linux-s390, linux-kernel, linux-fsdevel,
	linux-mm

On Fri, Oct 24, 2025 at 04:37:18PM -0400, Gregory Price wrote:
> On Fri, Oct 24, 2025 at 09:15:59PM +0100, Lorenzo Stoakes wrote:
> > On Fri, Oct 24, 2025 at 03:12:08PM -0400, Gregory Price wrote:
> > 
> > So maybe actually that isn't too bad of an idea...
> > 
> > Could also be
> > 
> > nonpresent_or_swap_t but that's kinda icky...
> 
> clearly we need:
> 
> union {
> 	swp_entry_t swap;
> 	nonpresent_entry_t np;
> 	pony_entry_t pony;
> 	plum_emtry_t beer;
> } leaf_entry_t;
> 
> with
> 
> leaf_type whats_that_pte(leaf_entry_t);

I think if you are going to try to rename swp_entry_t that is a pretty
good idea. Maybe swleaf_entry_t to pace emphasis that it is not used
by the HW page table walker would be a good compromise to the ugly
'non-present entry' term.

Jason


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [RFC PATCH 05/12] fs/proc/task_mmu: refactor pagemap_pmd_range()
  2025-10-27 16:11             ` Jason Gunthorpe
@ 2025-10-27 16:15               ` David Hildenbrand
  2025-10-27 16:26               ` Lorenzo Stoakes
  1 sibling, 0 replies; 47+ messages in thread
From: David Hildenbrand @ 2025-10-27 16:15 UTC (permalink / raw)
  To: Jason Gunthorpe, Gregory Price
  Cc: Lorenzo Stoakes, Andrew Morton, Christian Borntraeger,
	Janosch Frank, Claudio Imbrenda, Alexander Gordeev,
	Gerald Schaefer, Heiko Carstens, Vasily Gorbik, Sven Schnelle,
	Zi Yan, Baolin Wang, Liam R . Howlett, Nico Pache, Ryan Roberts,
	Dev Jain, Barry Song, Lance Yang, Kemeng Shi, Kairui Song,
	Nhat Pham, Baoquan He, Chris Li, Peter Xu, Matthew Wilcox,
	Leon Romanovsky, Muchun Song, Oscar Salvador, Vlastimil Babka,
	Mike Rapoport, Suren Baghdasaryan, Michal Hocko, Jann Horn,
	Matthew Brost, Joshua Hahn, Rakie Kim, Byungchul Park,
	Ying Huang, Alistair Popple, Pedro Falcato, Pasha Tatashin,
	Rik van Riel, Harry Yoo, kvm, linux-s390, linux-kernel,
	linux-fsdevel, linux-mm

On 27.10.25 17:11, Jason Gunthorpe wrote:
> On Fri, Oct 24, 2025 at 04:37:18PM -0400, Gregory Price wrote:
>> On Fri, Oct 24, 2025 at 09:15:59PM +0100, Lorenzo Stoakes wrote:
>>> On Fri, Oct 24, 2025 at 03:12:08PM -0400, Gregory Price wrote:
>>>
>>> So maybe actually that isn't too bad of an idea...
>>>
>>> Could also be
>>>
>>> nonpresent_or_swap_t but that's kinda icky...
>>
>> clearly we need:
>>
>> union {
>> 	swp_entry_t swap;
>> 	nonpresent_entry_t np;
>> 	pony_entry_t pony;
>> 	plum_emtry_t beer;
>> } leaf_entry_t;
>>
>> with
>>
>> leaf_type whats_that_pte(leaf_entry_t);
> 
> I think if you are going to try to rename swp_entry_t that is a pretty
> good idea. Maybe swleaf_entry_t to pace emphasis that it is not used
> by the HW page table walker would be a good compromise to the ugly
> 'non-present entry' term.

Something like that would work for me.

-- 
Cheers

David / dhildenb



^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [RFC PATCH 05/12] fs/proc/task_mmu: refactor pagemap_pmd_range()
  2025-10-27 16:11             ` Jason Gunthorpe
  2025-10-27 16:15               ` David Hildenbrand
@ 2025-10-27 16:26               ` Lorenzo Stoakes
  2025-10-27 16:31                 ` David Hildenbrand
  2025-10-27 16:38                 ` Gregory Price
  1 sibling, 2 replies; 47+ messages in thread
From: Lorenzo Stoakes @ 2025-10-27 16:26 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Gregory Price, Andrew Morton, Christian Borntraeger,
	Janosch Frank, Claudio Imbrenda, David Hildenbrand,
	Alexander Gordeev, Gerald Schaefer, Heiko Carstens,
	Vasily Gorbik, Sven Schnelle, Zi Yan, Baolin Wang,
	Liam R . Howlett, Nico Pache, Ryan Roberts, Dev Jain, Barry Song,
	Lance Yang, Kemeng Shi, Kairui Song, Nhat Pham, Baoquan He,
	Chris Li, Peter Xu, Matthew Wilcox, Leon Romanovsky, Muchun Song,
	Oscar Salvador, Vlastimil Babka, Mike Rapoport,
	Suren Baghdasaryan, Michal Hocko, Jann Horn, Matthew Brost,
	Joshua Hahn, Rakie Kim, Byungchul Park, Ying Huang,
	Alistair Popple, Pedro Falcato, Pasha Tatashin, Rik van Riel,
	Harry Yoo, kvm, linux-s390, linux-kernel, linux-fsdevel,
	linux-mm

On Mon, Oct 27, 2025 at 01:11:46PM -0300, Jason Gunthorpe wrote:
> On Fri, Oct 24, 2025 at 04:37:18PM -0400, Gregory Price wrote:
> > On Fri, Oct 24, 2025 at 09:15:59PM +0100, Lorenzo Stoakes wrote:
> > > On Fri, Oct 24, 2025 at 03:12:08PM -0400, Gregory Price wrote:
> > >
> > > So maybe actually that isn't too bad of an idea...
> > >
> > > Could also be
> > >
> > > nonpresent_or_swap_t but that's kinda icky...
> >
> > clearly we need:
> >
> > union {
> > 	swp_entry_t swap;
> > 	nonpresent_entry_t np;
> > 	pony_entry_t pony;
> > 	plum_emtry_t beer;
> > } leaf_entry_t;

I think Greg meant this as a joke [correct me if wrong] :) that was my
impression anyway (see original end of email...)

> >
> > with
> >
> > leaf_type whats_that_pte(leaf_entry_t);
>
> I think if you are going to try to rename swp_entry_t that is a pretty

Will reply elsewhere, but yes that's the intent.

> good idea. Maybe swleaf_entry_t to pace emphasis that it is not used

I get the point but that's kinda a horrible name visually.

sw_leaf_entry_t too... yeah maybe we can just put the software bit in a comment
maybe :)


> by the HW page table walker would be a good compromise to the ugly
> 'non-present entry' term.

I like leaf_entry_t name-wise.

I don't love the union.

How would we determine what type it is, we'd have to have some
generic_leaf_entry_t type or something to contain the swap type field and then
cast and... is it worth it?

Intent of non-present was to refer to not-swap swapentry. It's already a
convention that exists, e.g. is_pmd_non_present_folio_entry().

>
> Jason
>

Cheers, Lorenzo


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [RFC PATCH 05/12] fs/proc/task_mmu: refactor pagemap_pmd_range()
  2025-10-27 16:26               ` Lorenzo Stoakes
@ 2025-10-27 16:31                 ` David Hildenbrand
  2025-10-27 16:38                   ` Lorenzo Stoakes
  2025-10-27 16:38                 ` Gregory Price
  1 sibling, 1 reply; 47+ messages in thread
From: David Hildenbrand @ 2025-10-27 16:31 UTC (permalink / raw)
  To: Lorenzo Stoakes, Jason Gunthorpe
  Cc: Gregory Price, Andrew Morton, Christian Borntraeger,
	Janosch Frank, Claudio Imbrenda, Alexander Gordeev,
	Gerald Schaefer, Heiko Carstens, Vasily Gorbik, Sven Schnelle,
	Zi Yan, Baolin Wang, Liam R . Howlett, Nico Pache, Ryan Roberts,
	Dev Jain, Barry Song, Lance Yang, Kemeng Shi, Kairui Song,
	Nhat Pham, Baoquan He, Chris Li, Peter Xu, Matthew Wilcox,
	Leon Romanovsky, Muchun Song, Oscar Salvador, Vlastimil Babka,
	Mike Rapoport, Suren Baghdasaryan, Michal Hocko, Jann Horn,
	Matthew Brost, Joshua Hahn, Rakie Kim, Byungchul Park,
	Ying Huang, Alistair Popple, Pedro Falcato, Pasha Tatashin,
	Rik van Riel, Harry Yoo, kvm, linux-s390, linux-kernel,
	linux-fsdevel, linux-mm

> 
> I don't love the union.
> 
> How would we determine what type it is, we'd have to have some
> generic_leaf_entry_t type or something to contain the swap type field and then
> cast and... is it worth it?
> 
> Intent of non-present was to refer to not-swap swapentry. It's already a
> convention that exists, e.g. is_pmd_non_present_folio_entry().

Just noting that this was a recent addition (still not upstream) that 
essentially says "there is a folio here, but it's not in an ordinary 
present page table entry.

So we could change that to something better.

-- 
Cheers

David / dhildenb



^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [RFC PATCH 05/12] fs/proc/task_mmu: refactor pagemap_pmd_range()
  2025-10-27 16:31                 ` David Hildenbrand
@ 2025-10-27 16:38                   ` Lorenzo Stoakes
  2025-10-27 17:08                     ` Alexander Gordeev
  2025-10-28 12:52                     ` Jason Gunthorpe
  0 siblings, 2 replies; 47+ messages in thread
From: Lorenzo Stoakes @ 2025-10-27 16:38 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Jason Gunthorpe, Gregory Price, Andrew Morton,
	Christian Borntraeger, Janosch Frank, Claudio Imbrenda,
	Alexander Gordeev, Gerald Schaefer, Heiko Carstens,
	Vasily Gorbik, Sven Schnelle, Zi Yan, Baolin Wang,
	Liam R . Howlett, Nico Pache, Ryan Roberts, Dev Jain, Barry Song,
	Lance Yang, Kemeng Shi, Kairui Song, Nhat Pham, Baoquan He,
	Chris Li, Peter Xu, Matthew Wilcox, Leon Romanovsky, Muchun Song,
	Oscar Salvador, Vlastimil Babka, Mike Rapoport,
	Suren Baghdasaryan, Michal Hocko, Jann Horn, Matthew Brost,
	Joshua Hahn, Rakie Kim, Byungchul Park, Ying Huang,
	Alistair Popple, Pedro Falcato, Pasha Tatashin, Rik van Riel,
	Harry Yoo, kvm, linux-s390, linux-kernel, linux-fsdevel,
	linux-mm

On Mon, Oct 27, 2025 at 05:31:54PM +0100, David Hildenbrand wrote:
> >
> > I don't love the union.
> >
> > How would we determine what type it is, we'd have to have some
> > generic_leaf_entry_t type or something to contain the swap type field and then
> > cast and... is it worth it?
> >
> > Intent of non-present was to refer to not-swap swapentry. It's already a
> > convention that exists, e.g. is_pmd_non_present_folio_entry().
>
> Just noting that this was a recent addition (still not upstream) that
> essentially says "there is a folio here, but it's not in an ordinary present
> page table entry.
>
> So we could change that to something better.

Yeah but leaf_entry_t encapsulates BOTH swap and non-swap entries.

So that's nice.

What do you propose calling non-swap leaf entries? It starts spiralling down a
bit there.

And it's really common to have logic asserting it's actually a swap entry
vs. not etc.

I think we need to separate out what's practical for this series and what's
ideal going forwards.

Right now it's a complete mess, I don't want this to turn into a talking shop
that bogs things down so we don't move forward because it doesn't address
everything all at once.

What we have here already improves things dramatically IMO.

So what I propose is:

1. we keep the non-present terminology as a better way of referring to non-swap
   entries.

2. When I come to do the leaf_entry_t series, we can generalise and no longer
   differentiate between swap + non-swap.

We can then resume the discussion re: type there.

>
> --
> Cheers
>
> David / dhildenb
>

THanks, Lorenzo


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [RFC PATCH 05/12] fs/proc/task_mmu: refactor pagemap_pmd_range()
  2025-10-27 16:26               ` Lorenzo Stoakes
  2025-10-27 16:31                 ` David Hildenbrand
@ 2025-10-27 16:38                 ` Gregory Price
  1 sibling, 0 replies; 47+ messages in thread
From: Gregory Price @ 2025-10-27 16:38 UTC (permalink / raw)
  To: Lorenzo Stoakes
  Cc: Jason Gunthorpe, Andrew Morton, Christian Borntraeger,
	Janosch Frank, Claudio Imbrenda, David Hildenbrand,
	Alexander Gordeev, Gerald Schaefer, Heiko Carstens,
	Vasily Gorbik, Sven Schnelle, Zi Yan, Baolin Wang,
	Liam R . Howlett, Nico Pache, Ryan Roberts, Dev Jain, Barry Song,
	Lance Yang, Kemeng Shi, Kairui Song, Nhat Pham, Baoquan He,
	Chris Li, Peter Xu, Matthew Wilcox, Leon Romanovsky, Muchun Song,
	Oscar Salvador, Vlastimil Babka, Mike Rapoport,
	Suren Baghdasaryan, Michal Hocko, Jann Horn, Matthew Brost,
	Joshua Hahn, Rakie Kim, Byungchul Park, Ying Huang,
	Alistair Popple, Pedro Falcato, Pasha Tatashin, Rik van Riel,
	Harry Yoo, kvm, linux-s390, linux-kernel, linux-fsdevel,
	linux-mm

On Mon, Oct 27, 2025 at 04:26:54PM +0000, Lorenzo Stoakes wrote:
> On Mon, Oct 27, 2025 at 01:11:46PM -0300, Jason Gunthorpe wrote:
> > On Fri, Oct 24, 2025 at 04:37:18PM -0400, Gregory Price wrote:
> > > On Fri, Oct 24, 2025 at 09:15:59PM +0100, Lorenzo Stoakes wrote:
> > > > On Fri, Oct 24, 2025 at 03:12:08PM -0400, Gregory Price wrote:
> > > >
> > > > So maybe actually that isn't too bad of an idea...
> > > >
> > > > Could also be
> > > >
> > > > nonpresent_or_swap_t but that's kinda icky...
> > >
> > > clearly we need:
> > >
> > > union {
> > > 	swp_entry_t swap;
> > > 	nonpresent_entry_t np;
> > > 	pony_entry_t pony;
> > > 	plum_emtry_t beer;
> > > } leaf_entry_t;
> 
> I think Greg meant this as a joke [correct me if wrong] :) that was my
> impression anyway (see original end of email...)
>
> I like leaf_entry_t name-wise.
> 
> I don't love the union.
>

The union was definitely a joke - see `plum_entry_t beer`

There definitely shouldn't be enough extensions to warrant a union here,
that seems like negative value.

leaf_entry_t naming replacing swp_entry_t seems reasonable since that's
basically all swp_entry_t is in its current form - even according to the
this set's cover letter:

```
  There's an established convention in the kernel that we treat leaf page
  tables (so far at the PTE, PMD level) as containing 'swap entries' should
  they be neither empty (i.e. p**_none() evaluating true) nor present
  (i.e. p**_present() evaluating true).
```

~Gregory


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [RFC PATCH 05/12] fs/proc/task_mmu: refactor pagemap_pmd_range()
  2025-10-27 16:38                   ` Lorenzo Stoakes
@ 2025-10-27 17:08                     ` Alexander Gordeev
  2025-10-28 12:52                     ` Jason Gunthorpe
  1 sibling, 0 replies; 47+ messages in thread
From: Alexander Gordeev @ 2025-10-27 17:08 UTC (permalink / raw)
  To: Lorenzo Stoakes
  Cc: David Hildenbrand, Jason Gunthorpe, Gregory Price, Andrew Morton,
	Christian Borntraeger, Janosch Frank, Claudio Imbrenda,
	Gerald Schaefer, Heiko Carstens, Vasily Gorbik, Sven Schnelle,
	Zi Yan, Baolin Wang, Liam R . Howlett, Nico Pache, Ryan Roberts,
	Dev Jain, Barry Song, Lance Yang, Kemeng Shi, Kairui Song,
	Nhat Pham, Baoquan He, Chris Li, Peter Xu, Matthew Wilcox,
	Leon Romanovsky, Muchun Song, Oscar Salvador, Vlastimil Babka,
	Mike Rapoport, Suren Baghdasaryan, Michal Hocko, Jann Horn,
	Matthew Brost, Joshua Hahn, Rakie Kim, Byungchul Park,
	Ying Huang, Alistair Popple, Pedro Falcato, Pasha Tatashin,
	Rik van Riel, Harry Yoo, kvm, linux-s390, linux-kernel,
	linux-fsdevel, linux-mm

On Mon, Oct 27, 2025 at 04:38:05PM +0000, Lorenzo Stoakes wrote:
> Yeah but leaf_entry_t encapsulates BOTH swap and non-swap entries.
> 
> So that's nice.
> 
> What do you propose calling non-swap leaf entries? It starts spiralling down a
> bit there.

Absent?


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [RFC PATCH 00/12] remove is_swap_[pte, pmd]() + non-swap confusion
  2025-10-27 16:09 ` Jason Gunthorpe
@ 2025-10-27 17:33   ` Lorenzo Stoakes
  2025-10-28 12:48     ` Jason Gunthorpe
  2025-10-27 23:32   ` Gregory Price
  1 sibling, 1 reply; 47+ messages in thread
From: Lorenzo Stoakes @ 2025-10-27 17:33 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Andrew Morton, Christian Borntraeger, Janosch Frank,
	Claudio Imbrenda, David Hildenbrand, Alexander Gordeev,
	Gerald Schaefer, Heiko Carstens, Vasily Gorbik, Sven Schnelle,
	Zi Yan, Baolin Wang, Liam R . Howlett, Nico Pache, Ryan Roberts,
	Dev Jain, Barry Song, Lance Yang, Kemeng Shi, Kairui Song,
	Nhat Pham, Baoquan He, Chris Li, Peter Xu, Matthew Wilcox,
	Leon Romanovsky, Muchun Song, Oscar Salvador, Vlastimil Babka,
	Mike Rapoport, Suren Baghdasaryan, Michal Hocko, Jann Horn,
	Matthew Brost, Joshua Hahn, Rakie Kim, Byungchul Park,
	Gregory Price, Ying Huang, Alistair Popple, Pedro Falcato,
	Pasha Tatashin, Rik van Riel, Harry Yoo, kvm, linux-s390,
	linux-kernel, linux-fsdevel, linux-mm

(Note I never intended this to be an RFC, it was only because of
series-likely-to-be-dropped causing nasty conflicts this isn't an 'out
there' series rather a practical submission).

To preface, as I said elsewhere, I intend to do more on this, renaming
swp_entry_t to probably leaf_entry_t (thanks Gregory!)

The issue is no matter how I do this people will theorise different
approaches, I'm trying to practically find a way forward that works
iteratively.

There are 502 lines referencing swp_entry_t in the kernel.

Leaving _that_ to another series seems sensible to me.

On Mon, Oct 27, 2025 at 01:09:23PM -0300, Jason Gunthorpe wrote:
> On Fri, Oct 24, 2025 at 08:41:16AM +0100, Lorenzo Stoakes wrote:
> > There's an established convention in the kernel that we treat leaf page
> > tables (so far at the PTE, PMD level) as containing 'swap entries' should
> > they be neither empty (i.e. p**_none() evaluating true) nor present
> > (i.e. p**_present() evaluating true).
>
> I have to say I've never liked the none-vs-present naming either.

OK.

I think that's not something we can reasonably get away from practically at
this point.

I don't love 'none' as a thing. Empty would be better but you know
naming... hard.

>
> > This is deeply confusing, so this series goes further and eliminates the
> > non_swap_entry() predicate, replacing it with is_non_present_entry() - with
> > an eye to a new convention of referring to these non-swap 'swap entries' as
> > non-present.
>
> I'm not keen on is_non_present_entry(), it seems confusing again.

What is confusing about it?

The idea is we don't mention swap. If something is non-present and not swap
then it's... a non-present entry.

The logic generally does need to differentiate between the two.

Otherwise we're back to referencing swap again...

>
> It looks like we are stuck with swp_entry_t as the being the handle
> for a non-present pte. Oh well, not a great name, but fine..

We're not stuck with it, I'm just doing things a step at a time. I fully
intend to rename it.

Perfect is the enemy of the good, and this series improves things a lot.

>
> So we think of that swp_entry_t having multiple types: swap, migration,
> device private, etc, etc

I mean do we use fields in the swap entry significantly differently in each
of these types?

I think any change like that would need to be a follow-up series because
that'd involve rewriting a lot of code...

If there's _good reason_ to and we can sensibly stick a like struct entry
in the union we can rename swp_type to leaf_type or something that could be
nice actually.

But maybe make opaque...

But only if there's enough variation in the 'shape' of the data in the swap
entry between different types.

I would need to go check... because if not, then there's no point really.

>
> Then I'd think the general pattern should be to get a swp_entry_t:
>
>     if (pte_present(pte))
>         return;
>     swpent = pte_to_swp_entry(pte);
>
> And then evaluate the type:
>
>     if (swpent_is_swap()) {
>     }

That just embeds the confusion re: swap entry in the function name, no
that's horrible?

We may as well keep non_swap_entry() if we do that right?

Again, I intend to rename swap_entry_t. I'm not doing everything at once
because it's simply not practical.

>
>
> If you keep the naming as "swp_entry" indicates the multi-type value,

Nope don't intend to keep.

> then "swap" can mean a swp_entry which is used by the swap subsystem.
>
> That suggests functions like this:
>
> swpent_is_swap()
> swpent_is_migration()
> ..

The _whole point_ of this series is to separate out the idea that you're
dealing with swap entries so I don't like swpent as a name obviously.

Once we do the rename, we can do something like leafent_is_swap() etc.

And agreed that's neater and more consistent. We'd want to rename all swap
predicates similarly.

We could also just pre-empt and prefix functions with leafent_is_swap() if
you prefer.

We could even do:

/* TODO: Rename swap_entry_t to leaf_entry_t */
typedef swap_entry_t leaf_entry_t;

And use the new type right away.

What do you think?

>
> and your higher level helpers like:
>
> /* True if the pte is a swpent_is_swap() */
> static inline bool swpent_get_swap_pte(pte_t pte, swp_entry_t *entryp)
> {
>    if (pte_present(pte))
>         return false;
>    *swpent = pte_to_swp_entry(pte);
>    return swpent_is_swap(*swpent);
> }

I already implement in the series a pte_to_swp_entry_or_zero() function
that goes one further - checks pte_present() for you, if pte_none() you
just get an empty swap entry, so this can be:

static inline bool get_pte_swap_entry(pte_t, swp_entry_t *entryp)
{
	*entryp = pte_to_swp_entry_or_zero(pte);
	return is_swap_entry(*entryp);
}

Which is a valid refactoring suggestion but the building blocks are
_already in the series_.

I mean that's valid feedback, that's much nicer, will refactor thanks.

>
> I also think it will be more readable to keep all these things under a
> swpent namespace instead of using unstructured english names.

Nope. Again, the whole point of the series is to avoid referencing
swap. swpent_xxx() is just eliminating the purpose of the series right?

Yes it sucks that the type name is what it is, but this is an iterative
process.

But as above, we could pre-empt future changes and prefix with a
leafent_*() prefix if that works for you?

>
> > * pte_to_swp_entry_or_zero() - allows for convenient conversion from a PTE
> >   to a swap entry if present, or an empty swap entry if none. This is
> >   useful as many swap entry conversions are simply checking for flags for
> >   which this suffices.
>
> I'd expect a safe function should be more like
>
>    *swpent = pte_to_swp_entry_safe(pte);
>    return swpent_is_swap(*swpent);
>
> Where "safe" means that if the PTE is None or Present then
> swpent_is_XX() == false. Ie it returns a 0 swpent and 0 swpent is
> always nothing.

Not sure it's really 'safe', the name is unfortunate, but you could read
this as 'always get a valid swap entry to operate on'...

I like to encode the fact we are treating pte_none() this way.

pte_to_leaf_entry() is possible?

Or

leaf_entry_t leafent_from_pte()...?

>
> > * get_pte_swap_entry() - Retrieves a PTE swap entry if it truly is a swap
> >   entry (i.e. not a non-present entry), returning true if so, otherwise
> >   returns false. This simplifies a lot of logic that previously open-coded
> >   this.
>
> Like this is still a tortured function:

This is moot (also not the final version ofc as I get rid of
non_swap_entry()), I agree the approach you suggest is better as per above
so we'll go with that (though without the horrid embeded assignment bit).

>
> +static inline bool get_pte_swap_entry(pte_t pte, swp_entry_t *entryp)
> +{
> +       if (pte_present(pte))
> +               return false;
> +       if (pte_none(pte))
> +               return false;
> +
> +       *entryp = pte_to_swp_entry(pte);
> +       if (non_swap_entry(*entryp))
> +               return false;
> +
> +       return true;
> +}
> +
>
> static inline bool get_pte_swap_entry(pte_t pte, swp_entry_t *entryp)
> {
>    return swpent_is_swap(*swpent = pte_to_swp_entry_safe(pte));
> }

I absolutely hate that embedded assignment, but this is equivalent to what
I suggested above, so agreed this is a good suggestion broadly.

>
> Maybe it doesn't even need an inline at that point?

Don't understand what you mean by that. It's in a header file?

>
> > * is_huge_pmd() - Determines if a PMD contains either a present transparent
> >   huge page entry or a huge non-present entry. This again simplifies a lot
> >   of logic that simply open-coded this.
>
> is_huge_or_swpent_pmd() would be nicer, IMHO. I think it is surprising
> when any of these APIs accept swap entries without being explicit

Again, I'm not going to reference swap in a series intended to eliminate
this, it defeats the purpose.

And the non-present (or whatever you want to call it) entry _is_ huge. So
it's just adding more confusion that way IMO.

>
> Jason

Thanks, Lorenzo


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [RFC PATCH 00/12] remove is_swap_[pte, pmd]() + non-swap confusion
  2025-10-27 16:09 ` Jason Gunthorpe
  2025-10-27 17:33   ` Lorenzo Stoakes
@ 2025-10-27 23:32   ` Gregory Price
  1 sibling, 0 replies; 47+ messages in thread
From: Gregory Price @ 2025-10-27 23:32 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Lorenzo Stoakes, Andrew Morton, Christian Borntraeger,
	Janosch Frank, Claudio Imbrenda, David Hildenbrand,
	Alexander Gordeev, Gerald Schaefer, Heiko Carstens,
	Vasily Gorbik, Sven Schnelle, Zi Yan, Baolin Wang,
	Liam R . Howlett, Nico Pache, Ryan Roberts, Dev Jain, Barry Song,
	Lance Yang, Kemeng Shi, Kairui Song, Nhat Pham, Baoquan He,
	Chris Li, Peter Xu, Matthew Wilcox, Leon Romanovsky, Muchun Song,
	Oscar Salvador, Vlastimil Babka, Mike Rapoport,
	Suren Baghdasaryan, Michal Hocko, Jann Horn, Matthew Brost,
	Joshua Hahn, Rakie Kim, Byungchul Park, Ying Huang,
	Alistair Popple, Pedro Falcato, Pasha Tatashin, Rik van Riel,
	Harry Yoo, kvm, linux-s390, linux-kernel, linux-fsdevel,
	linux-mm

On Mon, Oct 27, 2025 at 01:09:23PM -0300, Jason Gunthorpe wrote:
> 
> I'm not keen on is_non_present_entry(), it seems confusing again.
> 

The confusion stems from `present` referring to the state of the hardware
PTE bits, instead of referring to the state of the entry.

But even if we're stuck with "non-present entry", it's still infinitely
more understandable (and teachable) than "non_swap_swap_entry".

So even if we never get to the point of replacing swp_entry_t, this is a
clear and obvious improvement.
~Gregory


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [RFC PATCH 00/12] remove is_swap_[pte, pmd]() + non-swap confusion
  2025-10-27 17:33   ` Lorenzo Stoakes
@ 2025-10-28 12:48     ` Jason Gunthorpe
  2025-10-28 18:20       ` Lorenzo Stoakes
  0 siblings, 1 reply; 47+ messages in thread
From: Jason Gunthorpe @ 2025-10-28 12:48 UTC (permalink / raw)
  To: Lorenzo Stoakes
  Cc: Andrew Morton, Christian Borntraeger, Janosch Frank,
	Claudio Imbrenda, David Hildenbrand, Alexander Gordeev,
	Gerald Schaefer, Heiko Carstens, Vasily Gorbik, Sven Schnelle,
	Zi Yan, Baolin Wang, Liam R . Howlett, Nico Pache, Ryan Roberts,
	Dev Jain, Barry Song, Lance Yang, Kemeng Shi, Kairui Song,
	Nhat Pham, Baoquan He, Chris Li, Peter Xu, Matthew Wilcox,
	Leon Romanovsky, Muchun Song, Oscar Salvador, Vlastimil Babka,
	Mike Rapoport, Suren Baghdasaryan, Michal Hocko, Jann Horn,
	Matthew Brost, Joshua Hahn, Rakie Kim, Byungchul Park,
	Gregory Price, Ying Huang, Alistair Popple, Pedro Falcato,
	Pasha Tatashin, Rik van Riel, Harry Yoo, kvm, linux-s390,
	linux-kernel, linux-fsdevel, linux-mm

On Mon, Oct 27, 2025 at 05:33:57PM +0000, Lorenzo Stoakes wrote:
> (Note I never intended this to be an RFC, it was only because of
> series-likely-to-be-dropped causing nasty conflicts this isn't an 'out
> there' series rather a practical submission).
> 
> To preface, as I said elsewhere, I intend to do more on this, renaming
> swp_entry_t to probably leaf_entry_t (thanks Gregory!)
> 
> The issue is no matter how I do this people will theorise different
> approaches, I'm trying to practically find a way forward that works
> iteratively.

It is why I suggested that swp_entry_t is the name we have (for this
series at least) and lean into it as the proper name for the abstract
idea of a multi-type'd value. Having a following series to rename
"swp_entry_t" to some "leaf entry" will resolve the poor naming.

But for now, "swp_entry_t" does not mean *swap* entry, it means "leaf
entry with a really bad type name".

And swpent_* is the namespace prefix for things dealing with
swp_entry_t.

If done consistently then the switch to leaf entry naming is just a
simple mass rename of swpent/leafent.

> > That suggests functions like this:
> >
> > swpent_is_swap()
> > swpent_is_migration()
> > ..
> 
> The _whole point_ of this series is to separate out the idea that you're
> dealing with swap entries so I don't like swpent as a name obviously.

As you say we can't fix everything at once, but if you do the above
and then rename the end state would be

leafent_is_swap()
leafent_is_migration()
 ..

And that seems like a good end state.

So pick the small steps, either lean into swpent in this series as the
place holder for leafent in the next..

Or this seems like a good idea too:

> We could also just pre-empt and prefix functions with leafent_is_swap() if
> you prefer.
> 
> We could even do:
> 
> /* TODO: Rename swap_entry_t to leaf_entry_t */
> typedef swap_entry_t leaf_entry_t;
>
> And use the new type right away.

Then the followup series is cleaning away swap_entry_t as a name.

> > /* True if the pte is a swpent_is_swap() */
> > static inline bool swpent_get_swap_pte(pte_t pte, swp_entry_t *entryp)
> > {
> >    if (pte_present(pte))
> >         return false;
> >    *swpent = pte_to_swp_entry(pte);
> >    return swpent_is_swap(*swpent);
> > }
> 
> I already implement in the series a pte_to_swp_entry_or_zero() function

I saw, but I don't think it is a great name.. It doesn't really give
"zero" it gives a swp_entry_t that doesn't pass any of the
swpent_is_XX() functions. ie a none type.

> that goes one further - checks pte_present() for you, if pte_none() you
> just get an empty swap entry, so this can be:

And I was hoping to see a path to get rid of the pte_none() stuff, or
at least on most arches. It is pretty pointless to check for pte_none
if the arch has a none-pte that already is 0..

So pte_none can be more like:
   swpent_is_none(pte_to_swp_entry(pte))

Where pte_to_swp_entry is just some bit maths with no conditionals.

> > I also think it will be more readable to keep all these things under a
> > swpent namespace instead of using unstructured english names.
> 
> Nope. Again, the whole point of the series is to avoid referencing
> swap. swpent_xxx() is just eliminating the purpose of the series right?
> 
> Yes it sucks that the type name is what it is, but this is an iterative
> process.

Sure, but don't add a bunch of new names with *no namespace*. As above
either accept swpent is a placeholder for leafent in the next series,
or do this:

> But as above, we could pre-empt future changes and prefix with a
> leafent_*() prefix if that works for you?

Which seems like a good idea to me.

> > I'd expect a safe function should be more like
> >
> >    *swpent = pte_to_swp_entry_safe(pte);
> >    return swpent_is_swap(*swpent);
> >
> > Where "safe" means that if the PTE is None or Present then
> > swpent_is_XX() == false. Ie it returns a 0 swpent and 0 swpent is
> > always nothing.
> 
> Not sure it's really 'safe', the name is unfortunate, but you could read
> this as 'always get a valid swap entry to operate on'...

My suggestion was the leaf entry has a type {none, swap, migration, etc}

And this _safe version returns the none type'd leaf entry for a
present pte.

We move toward eliminating the idea of pte_none by saying a
non-present pte is always a leaf_entry and what we call a "none pte"
is a "none leaf entry"

> leaf_entry_t leafent_from_pte()...?

Probably this one?
> > static inline bool get_pte_swap_entry(pte_t pte, swp_entry_t *entryp)
> > {
> >    return swpent_is_swap(*swpent = pte_to_swp_entry_safe(pte));
> > }
> 
> I absolutely hate that embedded assignment, but this is equivalent to what
> I suggested above, so agreed this is a good suggestion broadly.
> 
> >
> > Maybe it doesn't even need an inline at that point?
> 
> Don't understand what you mean by that. It's in a header file?

I mean just write it like this in the callers:

  swp_entry_t leafent = pte_to_swp_entry_safe(pte);

  if (swpent_is_swap(leafent)) {
  }

It is basically the same # lines as the helper version.

> > > * is_huge_pmd() - Determines if a PMD contains either a present transparent
> > >   huge page entry or a huge non-present entry. This again simplifies a lot
> > >   of logic that simply open-coded this.
> >
> > is_huge_or_swpent_pmd() would be nicer, IMHO. I think it is surprising
> > when any of these APIs accept swap entries without being explicit
> 
> Again, I'm not going to reference swap in a series intended to eliminate
> this, it defeats the purpose.
> 
> And the non-present (or whatever you want to call it) entry _is_ huge. So
> it's just adding more confusion that way IMO.

Then this:

  pmd_is_present_or_leafent(pmd)

Jason


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [RFC PATCH 05/12] fs/proc/task_mmu: refactor pagemap_pmd_range()
  2025-10-27 16:38                   ` Lorenzo Stoakes
  2025-10-27 17:08                     ` Alexander Gordeev
@ 2025-10-28 12:52                     ` Jason Gunthorpe
  2025-10-28 13:09                       ` Gregory Price
  2025-10-28 18:23                       ` Lorenzo Stoakes
  1 sibling, 2 replies; 47+ messages in thread
From: Jason Gunthorpe @ 2025-10-28 12:52 UTC (permalink / raw)
  To: Lorenzo Stoakes
  Cc: David Hildenbrand, Gregory Price, Andrew Morton,
	Christian Borntraeger, Janosch Frank, Claudio Imbrenda,
	Alexander Gordeev, Gerald Schaefer, Heiko Carstens,
	Vasily Gorbik, Sven Schnelle, Zi Yan, Baolin Wang,
	Liam R . Howlett, Nico Pache, Ryan Roberts, Dev Jain, Barry Song,
	Lance Yang, Kemeng Shi, Kairui Song, Nhat Pham, Baoquan He,
	Chris Li, Peter Xu, Matthew Wilcox, Leon Romanovsky, Muchun Song,
	Oscar Salvador, Vlastimil Babka, Mike Rapoport,
	Suren Baghdasaryan, Michal Hocko, Jann Horn, Matthew Brost,
	Joshua Hahn, Rakie Kim, Byungchul Park, Ying Huang,
	Alistair Popple, Pedro Falcato, Pasha Tatashin, Rik van Riel,
	Harry Yoo, kvm, linux-s390, linux-kernel, linux-fsdevel,
	linux-mm

On Mon, Oct 27, 2025 at 04:38:05PM +0000, Lorenzo Stoakes wrote:
> On Mon, Oct 27, 2025 at 05:31:54PM +0100, David Hildenbrand wrote:
> > >
> > > I don't love the union.
> > >
> > > How would we determine what type it is, we'd have to have some
> > > generic_leaf_entry_t type or something to contain the swap type field and then
> > > cast and... is it worth it?
> > >
> > > Intent of non-present was to refer to not-swap swapentry. It's already a
> > > convention that exists, e.g. is_pmd_non_present_folio_entry().
> >
> > Just noting that this was a recent addition (still not upstream) that
> > essentially says "there is a folio here, but it's not in an ordinary present
> > page table entry.
> >
> > So we could change that to something better.
> 
> Yeah but leaf_entry_t encapsulates BOTH swap and non-swap entries.
> 
> So that's nice.
> 
> What do you propose calling non-swap leaf entries? It starts spiralling down a
> bit there.

You don't even ask that question.

You have a leaf entry. It has a type.

What you are calling a "swap entry" is a "leaf entry of swap type".

The union helps encode in the type system what code is operating on
what type of the leaf entry.

It seems pretty simple.
 
> And it's really common to have logic asserting it's actually a swap entry
> vs. not etc.

leafent_is_swap(ent) - meaning is a "leaf entry of swap type".

> 1. we keep the non-present terminology as a better way of referring
>    to non-swap entries.

I vastly prefer you leap ahead and start using leaf_entry
terminology. We don't need a temporary name we are going to throw
away.

Jason


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [RFC PATCH 05/12] fs/proc/task_mmu: refactor pagemap_pmd_range()
  2025-10-28 12:52                     ` Jason Gunthorpe
@ 2025-10-28 13:09                       ` Gregory Price
  2025-10-28 17:36                         ` Lorenzo Stoakes
  2025-10-28 18:23                       ` Lorenzo Stoakes
  1 sibling, 1 reply; 47+ messages in thread
From: Gregory Price @ 2025-10-28 13:09 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Lorenzo Stoakes, David Hildenbrand, Andrew Morton,
	Christian Borntraeger, Janosch Frank, Claudio Imbrenda,
	Alexander Gordeev, Gerald Schaefer, Heiko Carstens,
	Vasily Gorbik, Sven Schnelle, Zi Yan, Baolin Wang,
	Liam R . Howlett, Nico Pache, Ryan Roberts, Dev Jain, Barry Song,
	Lance Yang, Kemeng Shi, Kairui Song, Nhat Pham, Baoquan He,
	Chris Li, Peter Xu, Matthew Wilcox, Leon Romanovsky, Muchun Song,
	Oscar Salvador, Vlastimil Babka, Mike Rapoport,
	Suren Baghdasaryan, Michal Hocko, Jann Horn, Matthew Brost,
	Joshua Hahn, Rakie Kim, Byungchul Park, Ying Huang,
	Alistair Popple, Pedro Falcato, Pasha Tatashin, Rik van Riel,
	Harry Yoo, kvm, linux-s390, linux-kernel, linux-fsdevel,
	linux-mm

On Tue, Oct 28, 2025 at 09:52:44AM -0300, Jason Gunthorpe wrote:
> On Mon, Oct 27, 2025 at 04:38:05PM +0000, Lorenzo Stoakes wrote:
> 
> The union helps encode in the type system what code is operating on
> what type of the leaf entry.
> 
> It seems pretty simple.
>

My recommendation of a union was a joke and is anything but simple.

Switching to a union now means every current toucher of a swp_entry_t
needs functions to do conversions to/from that thing as it gets passed
around to various subsystems. It increases overall complexity for no
value, i.e. "for negative value".

Please do not do this, I regret making the joke.

Regards,
Gregory


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [RFC PATCH 05/12] fs/proc/task_mmu: refactor pagemap_pmd_range()
  2025-10-28 13:09                       ` Gregory Price
@ 2025-10-28 17:36                         ` Lorenzo Stoakes
  0 siblings, 0 replies; 47+ messages in thread
From: Lorenzo Stoakes @ 2025-10-28 17:36 UTC (permalink / raw)
  To: Gregory Price
  Cc: Jason Gunthorpe, David Hildenbrand, Andrew Morton,
	Christian Borntraeger, Janosch Frank, Claudio Imbrenda,
	Alexander Gordeev, Gerald Schaefer, Heiko Carstens,
	Vasily Gorbik, Sven Schnelle, Zi Yan, Baolin Wang,
	Liam R . Howlett, Nico Pache, Ryan Roberts, Dev Jain, Barry Song,
	Lance Yang, Kemeng Shi, Kairui Song, Nhat Pham, Baoquan He,
	Chris Li, Peter Xu, Matthew Wilcox, Leon Romanovsky, Muchun Song,
	Oscar Salvador, Vlastimil Babka, Mike Rapoport,
	Suren Baghdasaryan, Michal Hocko, Jann Horn, Matthew Brost,
	Joshua Hahn, Rakie Kim, Byungchul Park, Ying Huang,
	Alistair Popple, Pedro Falcato, Pasha Tatashin, Rik van Riel,
	Harry Yoo, kvm, linux-s390, linux-kernel, linux-fsdevel,
	linux-mm

On Tue, Oct 28, 2025 at 09:09:01AM -0400, Gregory Price wrote:
> On Tue, Oct 28, 2025 at 09:52:44AM -0300, Jason Gunthorpe wrote:
> > On Mon, Oct 27, 2025 at 04:38:05PM +0000, Lorenzo Stoakes wrote:
> >
> > The union helps encode in the type system what code is operating on
> > what type of the leaf entry.
> >
> > It seems pretty simple.
> >
>
> My recommendation of a union was a joke and is anything but simple.
>
> Switching to a union now means every current toucher of a swp_entry_t
> needs functions to do conversions to/from that thing as it gets passed
> around to various subsystems. It increases overall complexity for no
> value, i.e. "for negative value".

This is the point I was trying to make yes.

>
> Please do not do this, I regret making the joke.

Never joke on list is the lesson here :)

I have had to learn that the hard way myself...

>
> Regards,
> Gregory

Cheers, Lorenzo


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [RFC PATCH 00/12] remove is_swap_[pte, pmd]() + non-swap confusion
  2025-10-28 12:48     ` Jason Gunthorpe
@ 2025-10-28 18:20       ` Lorenzo Stoakes
  2025-10-29 14:10         ` Jason Gunthorpe
  0 siblings, 1 reply; 47+ messages in thread
From: Lorenzo Stoakes @ 2025-10-28 18:20 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Andrew Morton, Christian Borntraeger, Janosch Frank,
	Claudio Imbrenda, David Hildenbrand, Alexander Gordeev,
	Gerald Schaefer, Heiko Carstens, Vasily Gorbik, Sven Schnelle,
	Zi Yan, Baolin Wang, Liam R . Howlett, Nico Pache, Ryan Roberts,
	Dev Jain, Barry Song, Lance Yang, Kemeng Shi, Kairui Song,
	Nhat Pham, Baoquan He, Chris Li, Peter Xu, Matthew Wilcox,
	Leon Romanovsky, Muchun Song, Oscar Salvador, Vlastimil Babka,
	Mike Rapoport, Suren Baghdasaryan, Michal Hocko, Jann Horn,
	Matthew Brost, Joshua Hahn, Rakie Kim, Byungchul Park,
	Gregory Price, Ying Huang, Alistair Popple, Pedro Falcato,
	Pasha Tatashin, Rik van Riel, Harry Yoo, kvm, linux-s390,
	linux-kernel, linux-fsdevel, linux-mm

On Tue, Oct 28, 2025 at 09:48:17AM -0300, Jason Gunthorpe wrote:
> On Mon, Oct 27, 2025 at 05:33:57PM +0000, Lorenzo Stoakes wrote:
> > (Note I never intended this to be an RFC, it was only because of
> > series-likely-to-be-dropped causing nasty conflicts this isn't an 'out
> > there' series rather a practical submission).
> >
> > To preface, as I said elsewhere, I intend to do more on this, renaming
> > swp_entry_t to probably leaf_entry_t (thanks Gregory!)
> >
> > The issue is no matter how I do this people will theorise different
> > approaches, I'm trying to practically find a way forward that works
> > iteratively.
>
> It is why I suggested that swp_entry_t is the name we have (for this
> series at least) and lean into it as the proper name for the abstract
> idea of a multi-type'd value. Having a following series to rename
> "swp_entry_t" to some "leaf entry" will resolve the poor naming.

This is addressed below.

> But for now, "swp_entry_t" does not mean *swap* entry, it means "leaf
> entry with a really bad type name".

Yes.

>
> And swpent_* is the namespace prefix for things dealing with
> swp_entry_t.
>
> If done consistently then the switch to leaf entry naming is just a
> simple mass rename of swpent/leafent.
>
> > > That suggests functions like this:
> > >
> > > swpent_is_swap()
> > > swpent_is_migration()
> > > ..
> >
> > The _whole point_ of this series is to separate out the idea that you're
> > dealing with swap entries so I don't like swpent as a name obviously.
>
> As you say we can't fix everything at once, but if you do the above
> and then rename the end state would be
>
> leafent_is_swap()
> leafent_is_migration()
>  ..
>
> And that seems like a good end state.

This is a two wrongs don't make a right situation.

I don't want to belabour this because we ultimately agree using
leafent_xxx() now is fine.

>
> So pick the small steps, either lean into swpent in this series as the
> place holder for leafent in the next..
>
> Or this seems like a good idea too:
>
> > We could also just pre-empt and prefix functions with leafent_is_swap() if
> > you prefer.

Good. I may even go so far as to say 'thank science we agree on that' ;)

Yes I'll do this.

> >
> > We could even do:
> >
> > /* TODO: Rename swap_entry_t to leaf_entry_t */
> > typedef swap_entry_t leaf_entry_t;

BTW typo, obv. meant swp_entry_t here...

> >
> > And use the new type right away.
>
> Then the followup series is cleaning away swap_entry_t as a name.

OK so you're good with the typedef? This would be quite nice actually as we
could then use leaf_entry_t in all the core leafent_xxx() logic ahead of
time and reduce confusion _there_ and effectively document that swp_entry_t
is just badly named.

This follow up series is one I very much intend to do, it's just going to
be a big churny one (hey my speciality anyway) but one which is best done
entirely mechanically I think.

>
> > > /* True if the pte is a swpent_is_swap() */
> > > static inline bool swpent_get_swap_pte(pte_t pte, swp_entry_t *entryp)
> > > {
> > >    if (pte_present(pte))
> > >         return false;
> > >    *swpent = pte_to_swp_entry(pte);
> > >    return swpent_is_swap(*swpent);
> > > }
> >
> > I already implement in the series a pte_to_swp_entry_or_zero() function
>
> I saw, but I don't think it is a great name.. It doesn't really give
> "zero" it gives a swp_entry_t that doesn't pass any of the
> swpent_is_XX() functions. ie a none type.

Naming is hard...

I mean really it wouldn't be all too awful to have pte_to_leafent() do this
now...

>
> > that goes one further - checks pte_present() for you, if pte_none() you
> > just get an empty swap entry, so this can be:
>
> And I was hoping to see a path to get rid of the pte_none() stuff, or
> at least on most arches. It is pretty pointless to check for pte_none
> if the arch has a none-pte that already is 0..
>
> So pte_none can be more like:
>    swpent_is_none(pte_to_swp_entry(pte))
>
> Where pte_to_swp_entry is just some bit maths with no conditionals.

*leafent

I mean I'm not so sure that's all that useful, you often want to skip over
things that are 'none' entries without doing this conversion.

We could use the concept of 'none is an empty leaf_entry_t' more thoroughly
internally in functions though.

I will see what I can do.

>
> > > I also think it will be more readable to keep all these things under a
> > > swpent namespace instead of using unstructured english names.
> >
> > Nope. Again, the whole point of the series is to avoid referencing
> > swap. swpent_xxx() is just eliminating the purpose of the series right?
> >
> > Yes it sucks that the type name is what it is, but this is an iterative
> > process.
>
> Sure, but don't add a bunch of new names with *no namespace*. As above
> either accept swpent is a placeholder for leafent in the next series,
> or do this:
>
> > But as above, we could pre-empt future changes and prefix with a
> > leafent_*() prefix if that works for you?
>
> Which seems like a good idea to me.

Yup. We agree on this.

>
> > > I'd expect a safe function should be more like
> > >
> > >    *swpent = pte_to_swp_entry_safe(pte);
> > >    return swpent_is_swap(*swpent);
> > >
> > > Where "safe" means that if the PTE is None or Present then
> > > swpent_is_XX() == false. Ie it returns a 0 swpent and 0 swpent is
> > > always nothing.
> >
> > Not sure it's really 'safe', the name is unfortunate, but you could read
> > this as 'always get a valid swap entry to operate on'...
>
> My suggestion was the leaf entry has a type {none, swap, migration, etc}
>
> And this _safe version returns the none type'd leaf entry for a
> present pte.

I mean that's already what's happening more or less with the ..._is_zero()
function (albeit needing a rename).

>
> We move toward eliminating the idea of pte_none by saying a
> non-present pte is always a leaf_entry and what we call a "none pte"
> is a "none leaf entry"

Well as discussed above.

>
> > leaf_entry_t leafent_from_pte()...?
>
> Probably this one?
> > > static inline bool get_pte_swap_entry(pte_t pte, swp_entry_t *entryp)
> > > {
> > >    return swpent_is_swap(*swpent = pte_to_swp_entry_safe(pte));
> > > }
> >
> > I absolutely hate that embedded assignment, but this is equivalent to what
> > I suggested above, so agreed this is a good suggestion broadly.
> >
> > >
> > > Maybe it doesn't even need an inline at that point?
> >
> > Don't understand what you mean by that. It's in a header file?
>
> I mean just write it like this in the callers:
>
>   swp_entry_t leafent = pte_to_swp_entry_safe(pte);
>
>   if (swpent_is_swap(leafent)) {
>   }
>
> It is basically the same # lines as the helper version.

Right, good point!

>
> > > > * is_huge_pmd() - Determines if a PMD contains either a present transparent
> > > >   huge page entry or a huge non-present entry. This again simplifies a lot
> > > >   of logic that simply open-coded this.
> > >
> > > is_huge_or_swpent_pmd() would be nicer, IMHO. I think it is surprising
> > > when any of these APIs accept swap entries without being explicit
> >
> > Again, I'm not going to reference swap in a series intended to eliminate
> > this, it defeats the purpose.
> >
> > And the non-present (or whatever you want to call it) entry _is_ huge. So
> > it's just adding more confusion that way IMO.
>
> Then this:
>
>   pmd_is_present_or_leafent(pmd)

A PMD can be present and contain an entry pointing at a PTE table so I'm
not sure that helps... naming is hard :)

Will think of alternatives on respin.

>
> Jason

Thanks, Lorenzo


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [RFC PATCH 05/12] fs/proc/task_mmu: refactor pagemap_pmd_range()
  2025-10-28 12:52                     ` Jason Gunthorpe
  2025-10-28 13:09                       ` Gregory Price
@ 2025-10-28 18:23                       ` Lorenzo Stoakes
  1 sibling, 0 replies; 47+ messages in thread
From: Lorenzo Stoakes @ 2025-10-28 18:23 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: David Hildenbrand, Gregory Price, Andrew Morton,
	Christian Borntraeger, Janosch Frank, Claudio Imbrenda,
	Alexander Gordeev, Gerald Schaefer, Heiko Carstens,
	Vasily Gorbik, Sven Schnelle, Zi Yan, Baolin Wang,
	Liam R . Howlett, Nico Pache, Ryan Roberts, Dev Jain, Barry Song,
	Lance Yang, Kemeng Shi, Kairui Song, Nhat Pham, Baoquan He,
	Chris Li, Peter Xu, Matthew Wilcox, Leon Romanovsky, Muchun Song,
	Oscar Salvador, Vlastimil Babka, Mike Rapoport,
	Suren Baghdasaryan, Michal Hocko, Jann Horn, Matthew Brost,
	Joshua Hahn, Rakie Kim, Byungchul Park, Ying Huang,
	Alistair Popple, Pedro Falcato, Pasha Tatashin, Rik van Riel,
	Harry Yoo, kvm, linux-s390, linux-kernel, linux-fsdevel,
	linux-mm

On Tue, Oct 28, 2025 at 09:52:44AM -0300, Jason Gunthorpe wrote:
> On Mon, Oct 27, 2025 at 04:38:05PM +0000, Lorenzo Stoakes wrote:
> > On Mon, Oct 27, 2025 at 05:31:54PM +0100, David Hildenbrand wrote:
> > > >
> > > > I don't love the union.
> > > >
> > > > How would we determine what type it is, we'd have to have some
> > > > generic_leaf_entry_t type or something to contain the swap type field and then
> > > > cast and... is it worth it?
> > > >
> > > > Intent of non-present was to refer to not-swap swapentry. It's already a
> > > > convention that exists, e.g. is_pmd_non_present_folio_entry().
> > >
> > > Just noting that this was a recent addition (still not upstream) that
> > > essentially says "there is a folio here, but it's not in an ordinary present
> > > page table entry.
> > >
> > > So we could change that to something better.
> >
> > Yeah but leaf_entry_t encapsulates BOTH swap and non-swap entries.
> >
> > So that's nice.
> >
> > What do you propose calling non-swap leaf entries? It starts spiralling down a
> > bit there.
>
> You don't even ask that question.
>
> You have a leaf entry. It has a type.
>
> What you are calling a "swap entry" is a "leaf entry of swap type".

I think this is pretty well covered in the other thread tbh.

>
> The union helps encode in the type system what code is operating on
> what type of the leaf entry.
>
> It seems pretty simple.

As Gregory says, this requires reworking a lot of code. We at the very least
need to defer this, and I remain unconvinced it's really worth it.

So yeah, let's come back to this later.

>
> > And it's really common to have logic asserting it's actually a swap entry
> > vs. not etc.
>
> leafent_is_swap(ent) - meaning is a "leaf entry of swap type".

I mean, we already have a function that does this in this series with a
different name :)

But sure I'll rename it to this so we're good.

>
> > 1. we keep the non-present terminology as a better way of referring
> >    to non-swap entries.
>
> I vastly prefer you leap ahead and start using leaf_entry
> terminology. We don't need a temporary name we are going to throw
> away.

Right well we agree on the other series to use the leafent_xxx() prefix so lets'
do that.

>
> Jason

Cheers, Lorenzo


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [RFC PATCH 00/12] remove is_swap_[pte, pmd]() + non-swap confusion
  2025-10-28 18:20       ` Lorenzo Stoakes
@ 2025-10-29 14:10         ` Jason Gunthorpe
  2025-10-29 19:09           ` Lorenzo Stoakes
  0 siblings, 1 reply; 47+ messages in thread
From: Jason Gunthorpe @ 2025-10-29 14:10 UTC (permalink / raw)
  To: Lorenzo Stoakes
  Cc: Andrew Morton, Christian Borntraeger, Janosch Frank,
	Claudio Imbrenda, David Hildenbrand, Alexander Gordeev,
	Gerald Schaefer, Heiko Carstens, Vasily Gorbik, Sven Schnelle,
	Zi Yan, Baolin Wang, Liam R . Howlett, Nico Pache, Ryan Roberts,
	Dev Jain, Barry Song, Lance Yang, Kemeng Shi, Kairui Song,
	Nhat Pham, Baoquan He, Chris Li, Peter Xu, Matthew Wilcox,
	Leon Romanovsky, Muchun Song, Oscar Salvador, Vlastimil Babka,
	Mike Rapoport, Suren Baghdasaryan, Michal Hocko, Jann Horn,
	Matthew Brost, Joshua Hahn, Rakie Kim, Byungchul Park,
	Gregory Price, Ying Huang, Alistair Popple, Pedro Falcato,
	Pasha Tatashin, Rik van Riel, Harry Yoo, kvm, linux-s390,
	linux-kernel, linux-fsdevel, linux-mm

On Tue, Oct 28, 2025 at 06:20:54PM +0000, Lorenzo Stoakes wrote:
> > > And use the new type right away.
> >
> > Then the followup series is cleaning away swap_entry_t as a name.
> 
> OK so you're good with the typedef? This would be quite nice actually as we
> could then use leaf_entry_t in all the core leafent_xxx() logic ahead of
> time and reduce confusion _there_ and effectively document that swp_entry_t
> is just badly named.

Yeah, I think so, a commit message explaining it is temporary and a
future series will mechanically rename it away and this is
preparation.

> I mean I'm not so sure that's all that useful, you often want to skip over
> things that are 'none' entries without doing this conversion.

Maybe go directly from a pte to the leaf entry type for this check?

#define __swp_type(x) ((x).val >> (64 - SWP_TYPE_BITS))

That's basically free on most arches..

> We could use the concept of 'none is an empty leaf_entry_t' more thoroughly
> internally in functions though.
> 
> I will see what I can do.

Sure, maybe something works out

Though if we want to keep them seperate then maybe pte_is_leafent() is
the right name for pte_none(). Reads so much better like this:

if (pte_is_leafent(pte)) {
    leafent_t leaf = leafent_from_pte(pte)

     if (leafent_is_swap(leaf)) {..}
}

> > Then this:
> >
> >   pmd_is_present_or_leafent(pmd)
> 
> A PMD can be present and contain an entry pointing at a PTE table so I'm
> not sure that helps... naming is hard :)

pmd_is_leaf_or_leafent()

In the PTE API we are calling present entries that are address, not
tables, leafs.

Jason


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [RFC PATCH 00/12] remove is_swap_[pte, pmd]() + non-swap confusion
  2025-10-29 14:10         ` Jason Gunthorpe
@ 2025-10-29 19:09           ` Lorenzo Stoakes
  2025-10-29 21:23             ` Gregory Price
  0 siblings, 1 reply; 47+ messages in thread
From: Lorenzo Stoakes @ 2025-10-29 19:09 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Andrew Morton, Christian Borntraeger, Janosch Frank,
	Claudio Imbrenda, David Hildenbrand, Alexander Gordeev,
	Gerald Schaefer, Heiko Carstens, Vasily Gorbik, Sven Schnelle,
	Zi Yan, Baolin Wang, Liam R . Howlett, Nico Pache, Ryan Roberts,
	Dev Jain, Barry Song, Lance Yang, Kemeng Shi, Kairui Song,
	Nhat Pham, Baoquan He, Chris Li, Peter Xu, Matthew Wilcox,
	Leon Romanovsky, Muchun Song, Oscar Salvador, Vlastimil Babka,
	Mike Rapoport, Suren Baghdasaryan, Michal Hocko, Jann Horn,
	Matthew Brost, Joshua Hahn, Rakie Kim, Byungchul Park,
	Gregory Price, Ying Huang, Alistair Popple, Pedro Falcato,
	Pasha Tatashin, Rik van Riel, Harry Yoo, kvm, linux-s390,
	linux-kernel, linux-fsdevel, linux-mm

On Wed, Oct 29, 2025 at 11:10:48AM -0300, Jason Gunthorpe wrote:
> On Tue, Oct 28, 2025 at 06:20:54PM +0000, Lorenzo Stoakes wrote:
> > > > And use the new type right away.
> > >
> > > Then the followup series is cleaning away swap_entry_t as a name.
> >
> > OK so you're good with the typedef? This would be quite nice actually as we
> > could then use leaf_entry_t in all the core leafent_xxx() logic ahead of
> > time and reduce confusion _there_ and effectively document that swp_entry_t
> > is just badly named.
>
> Yeah, I think so, a commit message explaining it is temporary and a
> future series will mechanically rename it away and this is
> preparation.
>
> > I mean I'm not so sure that's all that useful, you often want to skip over
> > things that are 'none' entries without doing this conversion.
>
> Maybe go directly from a pte to the leaf entry type for this check?
>
> #define __swp_type(x) ((x).val >> (64 - SWP_TYPE_BITS))
>
> That's basically free on most arches..

That's nice, I guess we could throw in a pte_present() check there and just grab
the type out direct like that

>
> > We could use the concept of 'none is an empty leaf_entry_t' more thoroughly
> > internally in functions though.
> >
> > I will see what I can do.
>
> Sure, maybe something works out
>
> Though if we want to keep them seperate then maybe pte_is_leafent() is
> the right name for pte_none(). Reads so much better like this:
>
> if (pte_is_leafent(pte)) {

Ah so this would amount to !pte_is_present()

>     leafent_t leaf = leafent_from_pte(pte)
>
>      if (leafent_is_swap(leaf)) {..}

And yeah... that is nice you know... :)

> }

>
> > > Then this:
> > >
> > >   pmd_is_present_or_leafent(pmd)
> >
> > A PMD can be present and contain an entry pointing at a PTE table so I'm
> > not sure that helps... naming is hard :)
>
> pmd_is_leaf_or_leafent()
>
> In the PTE API we are calling present entries that are address, not
> tables, leafs.

Hmm I think pmd_is_present_or_leafent() is clearer actually on second
thoughts :)

Still feel a desire to shove a 'huge' in there though but then it's getting
wordy... :)

Let me play around...

>
> Jason

Cheers, Lorenzo


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [RFC PATCH 00/12] remove is_swap_[pte, pmd]() + non-swap confusion
  2025-10-29 19:09           ` Lorenzo Stoakes
@ 2025-10-29 21:23             ` Gregory Price
  2025-10-30 10:21               ` Lorenzo Stoakes
  2025-11-02 14:27               ` Lorenzo Stoakes
  0 siblings, 2 replies; 47+ messages in thread
From: Gregory Price @ 2025-10-29 21:23 UTC (permalink / raw)
  To: Lorenzo Stoakes
  Cc: Jason Gunthorpe, Andrew Morton, Christian Borntraeger,
	Janosch Frank, Claudio Imbrenda, David Hildenbrand,
	Alexander Gordeev, Gerald Schaefer, Heiko Carstens,
	Vasily Gorbik, Sven Schnelle, Zi Yan, Baolin Wang,
	Liam R . Howlett, Nico Pache, Ryan Roberts, Dev Jain, Barry Song,
	Lance Yang, Kemeng Shi, Kairui Song, Nhat Pham, Baoquan He,
	Chris Li, Peter Xu, Matthew Wilcox, Leon Romanovsky, Muchun Song,
	Oscar Salvador, Vlastimil Babka, Mike Rapoport,
	Suren Baghdasaryan, Michal Hocko, Jann Horn, Matthew Brost,
	Joshua Hahn, Rakie Kim, Byungchul Park, Ying Huang,
	Alistair Popple, Pedro Falcato, Pasha Tatashin, Rik van Riel,
	Harry Yoo, kvm, linux-s390, linux-kernel, linux-fsdevel,
	linux-mm

On Wed, Oct 29, 2025 at 07:09:59PM +0000, Lorenzo Stoakes wrote:
> >
> > pmd_is_leaf_or_leafent()
> >
> > In the PTE API we are calling present entries that are address, not
> > tables, leafs.
> 
> Hmm I think pmd_is_present_or_leafent() is clearer actually on second
> thoughts :)
> 

apologies if misunderstanding, but I like short names :]

#define pmd_exists(entry) (pmd_is_present() || pmd_is_leafent())

If you care about what that entry is, you'll have to spell out these
checks in your code anyway, so no need to explode the naming to include
everything that might be there.

~Gregory


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [RFC PATCH 00/12] remove is_swap_[pte, pmd]() + non-swap confusion
  2025-10-29 21:23             ` Gregory Price
@ 2025-10-30 10:21               ` Lorenzo Stoakes
  2025-11-02 14:27               ` Lorenzo Stoakes
  1 sibling, 0 replies; 47+ messages in thread
From: Lorenzo Stoakes @ 2025-10-30 10:21 UTC (permalink / raw)
  To: Gregory Price
  Cc: Jason Gunthorpe, Andrew Morton, Christian Borntraeger,
	Janosch Frank, Claudio Imbrenda, David Hildenbrand,
	Alexander Gordeev, Gerald Schaefer, Heiko Carstens,
	Vasily Gorbik, Sven Schnelle, Zi Yan, Baolin Wang,
	Liam R . Howlett, Nico Pache, Ryan Roberts, Dev Jain, Barry Song,
	Lance Yang, Kemeng Shi, Kairui Song, Nhat Pham, Baoquan He,
	Chris Li, Peter Xu, Matthew Wilcox, Leon Romanovsky, Muchun Song,
	Oscar Salvador, Vlastimil Babka, Mike Rapoport,
	Suren Baghdasaryan, Michal Hocko, Jann Horn, Matthew Brost,
	Joshua Hahn, Rakie Kim, Byungchul Park, Ying Huang,
	Alistair Popple, Pedro Falcato, Pasha Tatashin, Rik van Riel,
	Harry Yoo, kvm, linux-s390, linux-kernel, linux-fsdevel,
	linux-mm

On Wed, Oct 29, 2025 at 05:23:39PM -0400, Gregory Price wrote:
> On Wed, Oct 29, 2025 at 07:09:59PM +0000, Lorenzo Stoakes wrote:
> > >
> > > pmd_is_leaf_or_leafent()
> > >
> > > In the PTE API we are calling present entries that are address, not
> > > tables, leafs.
> >
> > Hmm I think pmd_is_present_or_leafent() is clearer actually on second
> > thoughts :)
> >
>
> apologies if misunderstanding, but I like short names :]
>
> #define pmd_exists(entry) (pmd_is_present() || pmd_is_leafent())
>
> If you care about what that entry is, you'll have to spell out these
> checks in your code anyway, so no need to explode the naming to include
> everything that might be there.

Yeah that's a fair point, and I prefer short names also :)

This does read better thanks!

I should try for a non-RFC respin relatively soon now I've sent the other two
series I've been working on recently.

>
> ~Gregory

Cheers, Lorenzo


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [RFC PATCH 00/12] remove is_swap_[pte, pmd]() + non-swap confusion
  2025-10-29 21:23             ` Gregory Price
  2025-10-30 10:21               ` Lorenzo Stoakes
@ 2025-11-02 14:27               ` Lorenzo Stoakes
  1 sibling, 0 replies; 47+ messages in thread
From: Lorenzo Stoakes @ 2025-11-02 14:27 UTC (permalink / raw)
  To: Gregory Price
  Cc: Jason Gunthorpe, Andrew Morton, Christian Borntraeger,
	Janosch Frank, Claudio Imbrenda, David Hildenbrand,
	Alexander Gordeev, Gerald Schaefer, Heiko Carstens,
	Vasily Gorbik, Sven Schnelle, Zi Yan, Baolin Wang,
	Liam R . Howlett, Nico Pache, Ryan Roberts, Dev Jain, Barry Song,
	Lance Yang, Kemeng Shi, Kairui Song, Nhat Pham, Baoquan He,
	Chris Li, Peter Xu, Matthew Wilcox, Leon Romanovsky, Muchun Song,
	Oscar Salvador, Vlastimil Babka, Mike Rapoport,
	Suren Baghdasaryan, Michal Hocko, Jann Horn, Matthew Brost,
	Joshua Hahn, Rakie Kim, Byungchul Park, Ying Huang,
	Alistair Popple, Pedro Falcato, Pasha Tatashin, Rik van Riel,
	Harry Yoo, kvm, linux-s390, linux-kernel, linux-fsdevel,
	linux-mm

On Wed, Oct 29, 2025 at 05:23:39PM -0400, Gregory Price wrote:
> On Wed, Oct 29, 2025 at 07:09:59PM +0000, Lorenzo Stoakes wrote:
> > >
> > > pmd_is_leaf_or_leafent()
> > >
> > > In the PTE API we are calling present entries that are address, not
> > > tables, leafs.
> >
> > Hmm I think pmd_is_present_or_leafent() is clearer actually on second
> > thoughts :)
> >
>
> apologies if misunderstanding, but I like short names :]
>
> #define pmd_exists(entry) (pmd_is_present() || pmd_is_leafent())
>
> If you care about what that entry is, you'll have to spell out these
> checks in your code anyway, so no need to explode the naming to include
> everything that might be there.

Ah actually one issue we have here is that huge_mm.h is _super_ early in
header import order so we can't use any leafent stuff here at all, which
sucks.

Will have to compromise a bit here unfortunately!

>
> ~Gregory

Cheers, Lorenzo


^ permalink raw reply	[flat|nested] 47+ messages in thread

end of thread, other threads:[~2025-11-02 14:27 UTC | newest]

Thread overview: 47+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-10-24  7:41 [RFC PATCH 00/12] remove is_swap_[pte, pmd]() + non-swap confusion Lorenzo Stoakes
2025-10-24  7:41 ` [RFC PATCH 01/12] mm: introduce and use pte_to_swp_entry_or_zero() Lorenzo Stoakes
2025-10-24  7:41 ` [RFC PATCH 02/12] mm: avoid unnecessary uses of is_swap_pte() Lorenzo Stoakes
2025-10-24  7:41 ` [RFC PATCH 03/12] mm: introduce get_pte_swap_entry() and use it Lorenzo Stoakes
2025-10-24  7:41 ` [RFC PATCH 04/12] mm: use get_pte_swap_entry() in debug pgtable + remove is_swap_pte() Lorenzo Stoakes
2025-10-24  7:41 ` [RFC PATCH 05/12] fs/proc/task_mmu: refactor pagemap_pmd_range() Lorenzo Stoakes
2025-10-24 17:32   ` Gregory Price
2025-10-24 18:19     ` Lorenzo Stoakes
2025-10-24 19:12       ` Gregory Price
2025-10-24 20:15         ` Lorenzo Stoakes
2025-10-24 20:37           ` Gregory Price
2025-10-27 15:26             ` Lorenzo Stoakes
2025-10-27 16:11             ` Jason Gunthorpe
2025-10-27 16:15               ` David Hildenbrand
2025-10-27 16:26               ` Lorenzo Stoakes
2025-10-27 16:31                 ` David Hildenbrand
2025-10-27 16:38                   ` Lorenzo Stoakes
2025-10-27 17:08                     ` Alexander Gordeev
2025-10-28 12:52                     ` Jason Gunthorpe
2025-10-28 13:09                       ` Gregory Price
2025-10-28 17:36                         ` Lorenzo Stoakes
2025-10-28 18:23                       ` Lorenzo Stoakes
2025-10-27 16:38                 ` Gregory Price
2025-10-24  7:41 ` [RFC PATCH 06/12] mm: avoid unnecessary use of is_swap_pmd() Lorenzo Stoakes
2025-10-24  7:41 ` [RFC PATCH 07/12] mm: introduce is_huge_pmd() and use where appropriate Lorenzo Stoakes
2025-10-24  7:41 ` [RFC PATCH 08/12] mm/huge_memory: refactor copy_huge_pmd() non-present logic Lorenzo Stoakes
2025-10-24  7:41 ` [RFC PATCH 09/12] mm/huge_memory: refactor change_huge_pmd() " Lorenzo Stoakes
2025-10-24 18:41   ` Gregory Price
2025-10-24 18:44     ` Lorenzo Stoakes
2025-10-24 19:09       ` Gregory Price
2025-10-24  7:41 ` [RFC PATCH 10/12] mm: remove remaining is_swap_pmd() users and is_swap_pmd() Lorenzo Stoakes
2025-10-24  7:41 ` [RFC PATCH 11/12] mm: rename non_swap_entry() to is_non_present_entry() Lorenzo Stoakes
2025-10-24 19:07   ` Gregory Price
2025-10-24 20:17     ` Lorenzo Stoakes
2025-10-24  7:41 ` [RFC PATCH 12/12] mm: provide is_swap_entry() and use it Lorenzo Stoakes
2025-10-24 20:05 ` [RFC PATCH 00/12] remove is_swap_[pte, pmd]() + non-swap confusion Yosry Ahmed
2025-10-24 20:14   ` Lorenzo Stoakes
2025-10-27 16:09 ` Jason Gunthorpe
2025-10-27 17:33   ` Lorenzo Stoakes
2025-10-28 12:48     ` Jason Gunthorpe
2025-10-28 18:20       ` Lorenzo Stoakes
2025-10-29 14:10         ` Jason Gunthorpe
2025-10-29 19:09           ` Lorenzo Stoakes
2025-10-29 21:23             ` Gregory Price
2025-10-30 10:21               ` Lorenzo Stoakes
2025-11-02 14:27               ` Lorenzo Stoakes
2025-10-27 23:32   ` Gregory Price

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox