linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH mm-new v3 1/1] mm/khugepaged: guard is_zero_pfn() calls with pte_present()
@ 2025-10-20 15:11 Lance Yang
  2025-10-20 17:14 ` David Hildenbrand
  2025-10-20 17:46 ` Lorenzo Stoakes
  0 siblings, 2 replies; 3+ messages in thread
From: Lance Yang @ 2025-10-20 15:11 UTC (permalink / raw)
  To: akpm, david, lorenzo.stoakes
  Cc: ziy, baolin.wang, Liam.Howlett, npache, ryan.roberts, dev.jain,
	baohua, ioworker0, linux-kernel, linux-mm, Wei Yang, Lance Yang

From: Lance Yang <lance.yang@linux.dev>

A non-present entry, like a swap PTE, contains completely different data
(swap type and offset). pte_pfn() doesn't know this, so if we feed it a
non-present entry, it will spit out a junk PFN.

What if that junk PFN happens to match the zeropage's PFN by sheer
chance? While really unlikely, this would be really bad if it did.

So, let's fix this potential bug by ensuring all calls to is_zero_pfn()
in khugepaged.c are properly guarded by a pte_present() check.

Suggested-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reviewed-by: Nico Pache <npache@redhat.com>
Reviewed-by: Dev Jain <dev.jain@arm.com>
Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Reviewed-by: Wei Yang <richard.weiyang@gmail.com>
Signed-off-by: Lance Yang <lance.yang@linux.dev>
---
Applies against commit a61ca1246ad3 in mm-new.

v2 -> v3:
 - Collect Reviewed-by from Nico - thanks!
 - Add a VM_WARN_ON_ONCE() for unexpected PTEs (per David)
 - Introduce a pte_is_none_or_zero() helper to reduce duplication
   (per David and Lorenzo)
 - https://lore.kernel.org/linux-mm/20251017093847.36436-1-lance.yang@linux.dev/

v1 -> v2:
 - Collect Reviewed-by from Dev, Wei and Baolin - thanks!
 - Reduce a level of indentation (per Dev)
 - https://lore.kernel.org/linux-mm/20251016033643.10848-1-lance.yang@linux.dev/

 mm/khugepaged.c | 35 +++++++++++++++++++++--------------
 1 file changed, 21 insertions(+), 14 deletions(-)

diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index d635d821f611..6f2ae2238b5b 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -337,6 +337,13 @@ struct attribute_group khugepaged_attr_group = {
 };
 #endif /* CONFIG_SYSFS */
 
+static bool pte_none_or_zero(pte_t pte)
+{
+	if (pte_none(pte))
+		return true;
+	return pte_present(pte) && is_zero_pfn(pte_pfn(pte));
+}
+
 int hugepage_madvise(struct vm_area_struct *vma,
 		     vm_flags_t *vm_flags, int advice)
 {
@@ -518,6 +525,7 @@ static void release_pte_pages(pte_t *pte, pte_t *_pte,
 
 		if (pte_none(pteval))
 			continue;
+		VM_WARN_ON_ONCE(!pte_present(pteval));
 		pfn = pte_pfn(pteval);
 		if (is_zero_pfn(pfn))
 			continue;
@@ -548,8 +556,7 @@ static int __collapse_huge_page_isolate(struct vm_area_struct *vma,
 	for (_pte = pte; _pte < pte + HPAGE_PMD_NR;
 	     _pte++, addr += PAGE_SIZE) {
 		pte_t pteval = ptep_get(_pte);
-		if (pte_none(pteval) || (pte_present(pteval) &&
-				is_zero_pfn(pte_pfn(pteval)))) {
+		if (pte_none_or_zero(pteval)) {
 			++none_or_zero;
 			if (!userfaultfd_armed(vma) &&
 			    (!cc->is_khugepaged ||
@@ -690,17 +697,17 @@ static void __collapse_huge_page_copy_succeeded(pte_t *pte,
 	     address += nr_ptes * PAGE_SIZE) {
 		nr_ptes = 1;
 		pteval = ptep_get(_pte);
-		if (pte_none(pteval) || is_zero_pfn(pte_pfn(pteval))) {
+		if (pte_none_or_zero(pteval)) {
 			add_mm_counter(vma->vm_mm, MM_ANONPAGES, 1);
-			if (is_zero_pfn(pte_pfn(pteval))) {
-				/*
-				 * ptl mostly unnecessary.
-				 */
-				spin_lock(ptl);
-				ptep_clear(vma->vm_mm, address, _pte);
-				spin_unlock(ptl);
-				ksm_might_unmap_zero_page(vma->vm_mm, pteval);
-			}
+			if (pte_none(pteval))
+				continue;
+			/*
+			 * ptl mostly unnecessary.
+			 */
+			spin_lock(ptl);
+			ptep_clear(vma->vm_mm, address, _pte);
+			spin_unlock(ptl);
+			ksm_might_unmap_zero_page(vma->vm_mm, pteval);
 		} else {
 			struct page *src_page = pte_page(pteval);
 
@@ -794,7 +801,7 @@ static int __collapse_huge_page_copy(pte_t *pte, struct folio *folio,
 		unsigned long src_addr = address + i * PAGE_SIZE;
 		struct page *src_page;
 
-		if (pte_none(pteval) || is_zero_pfn(pte_pfn(pteval))) {
+		if (pte_none_or_zero(pteval)) {
 			clear_user_highpage(page, src_addr);
 			continue;
 		}
@@ -1294,7 +1301,7 @@ static int hpage_collapse_scan_pmd(struct mm_struct *mm,
 				goto out_unmap;
 			}
 		}
-		if (pte_none(pteval) || is_zero_pfn(pte_pfn(pteval))) {
+		if (pte_none_or_zero(pteval)) {
 			++none_or_zero;
 			if (!userfaultfd_armed(vma) &&
 			    (!cc->is_khugepaged ||
-- 
2.49.0



^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH mm-new v3 1/1] mm/khugepaged: guard is_zero_pfn() calls with pte_present()
  2025-10-20 15:11 [PATCH mm-new v3 1/1] mm/khugepaged: guard is_zero_pfn() calls with pte_present() Lance Yang
@ 2025-10-20 17:14 ` David Hildenbrand
  2025-10-20 17:46 ` Lorenzo Stoakes
  1 sibling, 0 replies; 3+ messages in thread
From: David Hildenbrand @ 2025-10-20 17:14 UTC (permalink / raw)
  To: Lance Yang, akpm, lorenzo.stoakes
  Cc: ziy, baolin.wang, Liam.Howlett, npache, ryan.roberts, dev.jain,
	baohua, ioworker0, linux-kernel, linux-mm, Wei Yang

On 20.10.25 17:11, Lance Yang wrote:
> From: Lance Yang <lance.yang@linux.dev>
> 
> A non-present entry, like a swap PTE, contains completely different data
> (swap type and offset). pte_pfn() doesn't know this, so if we feed it a
> non-present entry, it will spit out a junk PFN.
> 
> What if that junk PFN happens to match the zeropage's PFN by sheer
> chance? While really unlikely, this would be really bad if it did.
> 
> So, let's fix this potential bug by ensuring all calls to is_zero_pfn()
> in khugepaged.c are properly guarded by a pte_present() check.
> 
> Suggested-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
> Reviewed-by: Nico Pache <npache@redhat.com>
> Reviewed-by: Dev Jain <dev.jain@arm.com>
> Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>
> Reviewed-by: Wei Yang <richard.weiyang@gmail.com>
> Signed-off-by: Lance Yang <lance.yang@linux.dev>
> ---

Works for me!

Acked-by: David Hildenbrand <david@redhat.com>

-- 
Cheers

David / dhildenb



^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH mm-new v3 1/1] mm/khugepaged: guard is_zero_pfn() calls with pte_present()
  2025-10-20 15:11 [PATCH mm-new v3 1/1] mm/khugepaged: guard is_zero_pfn() calls with pte_present() Lance Yang
  2025-10-20 17:14 ` David Hildenbrand
@ 2025-10-20 17:46 ` Lorenzo Stoakes
  1 sibling, 0 replies; 3+ messages in thread
From: Lorenzo Stoakes @ 2025-10-20 17:46 UTC (permalink / raw)
  To: Lance Yang
  Cc: akpm, david, ziy, baolin.wang, Liam.Howlett, npache,
	ryan.roberts, dev.jain, baohua, ioworker0, linux-kernel,
	linux-mm, Wei Yang

On Mon, Oct 20, 2025 at 11:11:11PM +0800, Lance Yang wrote:
> From: Lance Yang <lance.yang@linux.dev>
>
> A non-present entry, like a swap PTE, contains completely different data
> (swap type and offset). pte_pfn() doesn't know this, so if we feed it a
> non-present entry, it will spit out a junk PFN.
>
> What if that junk PFN happens to match the zeropage's PFN by sheer
> chance? While really unlikely, this would be really bad if it did.
>
> So, let's fix this potential bug by ensuring all calls to is_zero_pfn()
> in khugepaged.c are properly guarded by a pte_present() check.
>
> Suggested-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
> Reviewed-by: Nico Pache <npache@redhat.com>
> Reviewed-by: Dev Jain <dev.jain@arm.com>
> Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>
> Reviewed-by: Wei Yang <richard.weiyang@gmail.com>
> Signed-off-by: Lance Yang <lance.yang@linux.dev>

This LGTM thanks for this, so:

Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>

> ---
> Applies against commit a61ca1246ad3 in mm-new.
>
> v2 -> v3:
>  - Collect Reviewed-by from Nico - thanks!
>  - Add a VM_WARN_ON_ONCE() for unexpected PTEs (per David)
>  - Introduce a pte_is_none_or_zero() helper to reduce duplication
>    (per David and Lorenzo)

Wow I hadn't realised David had suggested that too, that was actually both of us
doing that independently by chance lol.

I guess we agree then :)

>  - https://lore.kernel.org/linux-mm/20251017093847.36436-1-lance.yang@linux.dev/
>
> v1 -> v2:
>  - Collect Reviewed-by from Dev, Wei and Baolin - thanks!
>  - Reduce a level of indentation (per Dev)
>  - https://lore.kernel.org/linux-mm/20251016033643.10848-1-lance.yang@linux.dev/
>
>  mm/khugepaged.c | 35 +++++++++++++++++++++--------------
>  1 file changed, 21 insertions(+), 14 deletions(-)
>
> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> index d635d821f611..6f2ae2238b5b 100644
> --- a/mm/khugepaged.c
> +++ b/mm/khugepaged.c
> @@ -337,6 +337,13 @@ struct attribute_group khugepaged_attr_group = {
>  };
>  #endif /* CONFIG_SYSFS */
>
> +static bool pte_none_or_zero(pte_t pte)
> +{
> +	if (pte_none(pte))
> +		return true;
> +	return pte_present(pte) && is_zero_pfn(pte_pfn(pte));
> +}
> +
>  int hugepage_madvise(struct vm_area_struct *vma,
>  		     vm_flags_t *vm_flags, int advice)
>  {
> @@ -518,6 +525,7 @@ static void release_pte_pages(pte_t *pte, pte_t *_pte,
>
>  		if (pte_none(pteval))
>  			continue;
> +		VM_WARN_ON_ONCE(!pte_present(pteval));
>  		pfn = pte_pfn(pteval);
>  		if (is_zero_pfn(pfn))
>  			continue;
> @@ -548,8 +556,7 @@ static int __collapse_huge_page_isolate(struct vm_area_struct *vma,
>  	for (_pte = pte; _pte < pte + HPAGE_PMD_NR;
>  	     _pte++, addr += PAGE_SIZE) {
>  		pte_t pteval = ptep_get(_pte);
> -		if (pte_none(pteval) || (pte_present(pteval) &&
> -				is_zero_pfn(pte_pfn(pteval)))) {
> +		if (pte_none_or_zero(pteval)) {
>  			++none_or_zero;
>  			if (!userfaultfd_armed(vma) &&
>  			    (!cc->is_khugepaged ||
> @@ -690,17 +697,17 @@ static void __collapse_huge_page_copy_succeeded(pte_t *pte,
>  	     address += nr_ptes * PAGE_SIZE) {
>  		nr_ptes = 1;
>  		pteval = ptep_get(_pte);
> -		if (pte_none(pteval) || is_zero_pfn(pte_pfn(pteval))) {
> +		if (pte_none_or_zero(pteval)) {
>  			add_mm_counter(vma->vm_mm, MM_ANONPAGES, 1);
> -			if (is_zero_pfn(pte_pfn(pteval))) {
> -				/*
> -				 * ptl mostly unnecessary.
> -				 */
> -				spin_lock(ptl);
> -				ptep_clear(vma->vm_mm, address, _pte);
> -				spin_unlock(ptl);
> -				ksm_might_unmap_zero_page(vma->vm_mm, pteval);
> -			}
> +			if (pte_none(pteval))
> +				continue;
> +			/*
> +			 * ptl mostly unnecessary.
> +			 */
> +			spin_lock(ptl);
> +			ptep_clear(vma->vm_mm, address, _pte);
> +			spin_unlock(ptl);
> +			ksm_might_unmap_zero_page(vma->vm_mm, pteval);
>  		} else {
>  			struct page *src_page = pte_page(pteval);
>
> @@ -794,7 +801,7 @@ static int __collapse_huge_page_copy(pte_t *pte, struct folio *folio,
>  		unsigned long src_addr = address + i * PAGE_SIZE;
>  		struct page *src_page;
>
> -		if (pte_none(pteval) || is_zero_pfn(pte_pfn(pteval))) {
> +		if (pte_none_or_zero(pteval)) {
>  			clear_user_highpage(page, src_addr);
>  			continue;
>  		}
> @@ -1294,7 +1301,7 @@ static int hpage_collapse_scan_pmd(struct mm_struct *mm,
>  				goto out_unmap;
>  			}
>  		}
> -		if (pte_none(pteval) || is_zero_pfn(pte_pfn(pteval))) {
> +		if (pte_none_or_zero(pteval)) {
>  			++none_or_zero;
>  			if (!userfaultfd_armed(vma) &&
>  			    (!cc->is_khugepaged ||
> --
> 2.49.0
>


^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2025-10-20 17:46 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-10-20 15:11 [PATCH mm-new v3 1/1] mm/khugepaged: guard is_zero_pfn() calls with pte_present() Lance Yang
2025-10-20 17:14 ` David Hildenbrand
2025-10-20 17:46 ` Lorenzo Stoakes

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox