linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [Patch v2 0/2] mm_slot: fix the usage of mm_slot_entry
@ 2025-09-19  7:12 Wei Yang
  2025-09-19  7:12 ` [Patch v2 1/2] mm/ksm: get mm_slot by mm_slot_entry() when slot is !NULL Wei Yang
  2025-09-19  7:12 ` [Patch v2 2/2] mm/khugepaged: remove definition of struct khugepaged_mm_slot Wei Yang
  0 siblings, 2 replies; 16+ messages in thread
From: Wei Yang @ 2025-09-19  7:12 UTC (permalink / raw)
  To: akpm, david, lorenzo.stoakes, ziy, baolin.wang, Liam.Howlett,
	npache, ryan.roberts, dev.jain, baohua, lance.yang, xu.xin16
  Cc: linux-mm, Wei Yang

The usage of mm_slot_entry() in ksm/khugepaged is not correct. In case
mm_slot_lookup() return a NULL slot, mm_slot_entry() should not be called.

To fix this:

Patch 1: check slot before continue in ksm.c
Patch 2: remove the definition of khugepaged_mm_slot

v2: 
  fix the error in code instead guard by compiler

V1:
  add a BUILD_BUG_ON_MSG() to make sure slot is the first element

[1]: https://lkml.kernel.org/r/20250914000026.17986-1-richard.weiyang@gmail.com

Wei Yang (2):
  mm/ksm: get mm_slot by mm_slot_entry() when slot is !NULL
  mm/khugepaged: remove definition of struct khugepaged_mm_slot

 mm/khugepaged.c | 57 ++++++++++++++++++-------------------------------
 mm/ksm.c        | 20 +++++++++--------
 2 files changed, 32 insertions(+), 45 deletions(-)

-- 
2.34.1



^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Patch v2 1/2] mm/ksm: get mm_slot by mm_slot_entry() when slot is !NULL
  2025-09-19  7:12 [Patch v2 0/2] mm_slot: fix the usage of mm_slot_entry Wei Yang
@ 2025-09-19  7:12 ` Wei Yang
  2025-09-19  7:24   ` David Hildenbrand
                     ` (2 more replies)
  2025-09-19  7:12 ` [Patch v2 2/2] mm/khugepaged: remove definition of struct khugepaged_mm_slot Wei Yang
  1 sibling, 3 replies; 16+ messages in thread
From: Wei Yang @ 2025-09-19  7:12 UTC (permalink / raw)
  To: akpm, david, lorenzo.stoakes, ziy, baolin.wang, Liam.Howlett,
	npache, ryan.roberts, dev.jain, baohua, lance.yang, xu.xin16
  Cc: linux-mm, Wei Yang, Kiryl Shutsemau

When using mm_slot in ksm, there is code snip like:

     slot = mm_slot_lookup(mm_slots_hash, mm);
     mm_slot = mm_slot_entry(slot, struct ksm_mm_slot, slot);
     if (mm_slot && ..) {
     }

The mm_slot_entry() won't return a valid value if slot is NULL
generally. But currently it works since slot is the first element of
struct ksm_mm_slot.

To reduce the ambiguity and make it robust, access mm_slot_entry() when
slot is !NULL.

Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
Cc: Lance Yang <lance.yang@linux.dev>
Cc: David Hildenbrand <david@redhat.com>
Cc: Dev Jain <dev.jain@arm.com>
Cc: Kiryl Shutsemau <kirill@shutemov.name>
Cc: xu.xin16@zte.com.cn

---
v2:
  * check on slot before continue
---
 mm/ksm.c | 20 +++++++++++---------
 1 file changed, 11 insertions(+), 9 deletions(-)

diff --git a/mm/ksm.c b/mm/ksm.c
index 2ef29802a49b..6ff7f840e50a 100644
--- a/mm/ksm.c
+++ b/mm/ksm.c
@@ -2939,15 +2939,17 @@ void __ksm_exit(struct mm_struct *mm)
 
 	spin_lock(&ksm_mmlist_lock);
 	slot = mm_slot_lookup(mm_slots_hash, mm);
-	mm_slot = mm_slot_entry(slot, struct ksm_mm_slot, slot);
-	if (mm_slot && ksm_scan.mm_slot != mm_slot) {
-		if (!mm_slot->rmap_list) {
-			hash_del(&slot->hash);
-			list_del(&slot->mm_node);
-			easy_to_free = 1;
-		} else {
-			list_move(&slot->mm_node,
-				  &ksm_scan.mm_slot->slot.mm_node);
+	if (slot) {
+		mm_slot = mm_slot_entry(slot, struct ksm_mm_slot, slot);
+		if (ksm_scan.mm_slot != mm_slot) {
+			if (!mm_slot->rmap_list) {
+				hash_del(&slot->hash);
+				list_del(&slot->mm_node);
+				easy_to_free = 1;
+			} else {
+				list_move(&slot->mm_node,
+					  &ksm_scan.mm_slot->slot.mm_node);
+			}
 		}
 	}
 	spin_unlock(&ksm_mmlist_lock);
-- 
2.34.1



^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Patch v2 2/2] mm/khugepaged: remove definition of struct khugepaged_mm_slot
  2025-09-19  7:12 [Patch v2 0/2] mm_slot: fix the usage of mm_slot_entry Wei Yang
  2025-09-19  7:12 ` [Patch v2 1/2] mm/ksm: get mm_slot by mm_slot_entry() when slot is !NULL Wei Yang
@ 2025-09-19  7:12 ` Wei Yang
  2025-09-19  7:36   ` David Hildenbrand
  2025-09-20 11:52   ` SeongJae Park
  1 sibling, 2 replies; 16+ messages in thread
From: Wei Yang @ 2025-09-19  7:12 UTC (permalink / raw)
  To: akpm, david, lorenzo.stoakes, ziy, baolin.wang, Liam.Howlett,
	npache, ryan.roberts, dev.jain, baohua, lance.yang, xu.xin16
  Cc: linux-mm, Wei Yang, Kiryl Shutsemau

Current code is not correct to get struct khugepaged_mm_slot by
mm_slot_entry() without checking mm_slot is !NULL. There is no problem
reported since slot is the first element of struct khugepaged_mm_slot.

While struct khugepaged_mm_slot is just a wrapper of struct mm_slot,
there is no need to define it.

Remove the definition of struct khugepaged_mm_slot, so there is not
chance to miss use mm_slot_entry().

Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
Cc: Lance Yang <lance.yang@linux.dev>
Cc: David Hildenbrand <david@redhat.com>
Cc: Dev Jain <dev.jain@arm.com>
Cc: Kiryl Shutsemau <kirill@shutemov.name>
Cc: xu.xin16@zte.com.cn
---
 mm/khugepaged.c | 57 ++++++++++++++++++-------------------------------
 1 file changed, 21 insertions(+), 36 deletions(-)

diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index e019ea2cbab0..88ea92c64bf0 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -103,14 +103,6 @@ struct collapse_control {
 	nodemask_t alloc_nmask;
 };
 
-/**
- * struct khugepaged_mm_slot - khugepaged information per mm that is being scanned
- * @slot: hash lookup from mm to mm_slot
- */
-struct khugepaged_mm_slot {
-	struct mm_slot slot;
-};
-
 /**
  * struct khugepaged_scan - cursor for scanning
  * @mm_head: the head of the mm list to scan
@@ -121,7 +113,7 @@ struct khugepaged_mm_slot {
  */
 struct khugepaged_scan {
 	struct list_head mm_head;
-	struct khugepaged_mm_slot *mm_slot;
+	struct mm_slot *mm_slot;
 	unsigned long address;
 };
 
@@ -384,7 +376,10 @@ int hugepage_madvise(struct vm_area_struct *vma,
 
 int __init khugepaged_init(void)
 {
-	mm_slot_cache = KMEM_CACHE(khugepaged_mm_slot, 0);
+	mm_slot_cache = kmem_cache_create("khugepaged_mm_slot",
+					  sizeof(struct mm_slot),
+					  __alignof__(struct mm_slot),
+					  0, NULL);
 	if (!mm_slot_cache)
 		return -ENOMEM;
 
@@ -438,7 +433,6 @@ static bool hugepage_pmd_enabled(void)
 
 void __khugepaged_enter(struct mm_struct *mm)
 {
-	struct khugepaged_mm_slot *mm_slot;
 	struct mm_slot *slot;
 	int wakeup;
 
@@ -447,12 +441,10 @@ void __khugepaged_enter(struct mm_struct *mm)
 	if (unlikely(mm_flags_test_and_set(MMF_VM_HUGEPAGE, mm)))
 		return;
 
-	mm_slot = mm_slot_alloc(mm_slot_cache);
-	if (!mm_slot)
+	slot = mm_slot_alloc(mm_slot_cache);
+	if (!slot)
 		return;
 
-	slot = &mm_slot->slot;
-
 	spin_lock(&khugepaged_mm_lock);
 	mm_slot_insert(mm_slots_hash, mm, slot);
 	/*
@@ -480,14 +472,12 @@ void khugepaged_enter_vma(struct vm_area_struct *vma,
 
 void __khugepaged_exit(struct mm_struct *mm)
 {
-	struct khugepaged_mm_slot *mm_slot;
 	struct mm_slot *slot;
 	int free = 0;
 
 	spin_lock(&khugepaged_mm_lock);
 	slot = mm_slot_lookup(mm_slots_hash, mm);
-	mm_slot = mm_slot_entry(slot, struct khugepaged_mm_slot, slot);
-	if (mm_slot && khugepaged_scan.mm_slot != mm_slot) {
+	if (slot && khugepaged_scan.mm_slot != slot) {
 		hash_del(&slot->hash);
 		list_del(&slot->mm_node);
 		free = 1;
@@ -496,9 +486,9 @@ void __khugepaged_exit(struct mm_struct *mm)
 
 	if (free) {
 		mm_flags_clear(MMF_VM_HUGEPAGE, mm);
-		mm_slot_free(mm_slot_cache, mm_slot);
+		mm_slot_free(mm_slot_cache, slot);
 		mmdrop(mm);
-	} else if (mm_slot) {
+	} else if (slot) {
 		/*
 		 * This is required to serialize against
 		 * hpage_collapse_test_exit() (which is guaranteed to run
@@ -1432,9 +1422,8 @@ static int hpage_collapse_scan_pmd(struct mm_struct *mm,
 	return result;
 }
 
-static void collect_mm_slot(struct khugepaged_mm_slot *mm_slot)
+static void collect_mm_slot(struct mm_slot *slot)
 {
-	struct mm_slot *slot = &mm_slot->slot;
 	struct mm_struct *mm = slot->mm;
 
 	lockdep_assert_held(&khugepaged_mm_lock);
@@ -1451,7 +1440,7 @@ static void collect_mm_slot(struct khugepaged_mm_slot *mm_slot)
 		 */
 
 		/* khugepaged_mm_lock actually not necessary for the below */
-		mm_slot_free(mm_slot_cache, mm_slot);
+		mm_slot_free(mm_slot_cache, slot);
 		mmdrop(mm);
 	}
 }
@@ -2376,7 +2365,6 @@ static unsigned int khugepaged_scan_mm_slot(unsigned int pages, int *result,
 	__acquires(&khugepaged_mm_lock)
 {
 	struct vma_iterator vmi;
-	struct khugepaged_mm_slot *mm_slot;
 	struct mm_slot *slot;
 	struct mm_struct *mm;
 	struct vm_area_struct *vma;
@@ -2387,14 +2375,12 @@ static unsigned int khugepaged_scan_mm_slot(unsigned int pages, int *result,
 	*result = SCAN_FAIL;
 
 	if (khugepaged_scan.mm_slot) {
-		mm_slot = khugepaged_scan.mm_slot;
-		slot = &mm_slot->slot;
+		slot = khugepaged_scan.mm_slot;
 	} else {
 		slot = list_first_entry(&khugepaged_scan.mm_head,
 				     struct mm_slot, mm_node);
-		mm_slot = mm_slot_entry(slot, struct khugepaged_mm_slot, slot);
 		khugepaged_scan.address = 0;
-		khugepaged_scan.mm_slot = mm_slot;
+		khugepaged_scan.mm_slot = slot;
 	}
 	spin_unlock(&khugepaged_mm_lock);
 
@@ -2492,7 +2478,7 @@ static unsigned int khugepaged_scan_mm_slot(unsigned int pages, int *result,
 breakouterloop_mmap_lock:
 
 	spin_lock(&khugepaged_mm_lock);
-	VM_BUG_ON(khugepaged_scan.mm_slot != mm_slot);
+	VM_BUG_ON(khugepaged_scan.mm_slot != slot);
 	/*
 	 * Release the current mm_slot if this mm is about to die, or
 	 * if we scanned all vmas of this mm.
@@ -2505,15 +2491,14 @@ static unsigned int khugepaged_scan_mm_slot(unsigned int pages, int *result,
 		 */
 		if (!list_is_last(&slot->mm_node, &khugepaged_scan.mm_head)) {
 			slot = list_next_entry(slot, mm_node);
-			khugepaged_scan.mm_slot =
-				mm_slot_entry(slot, struct khugepaged_mm_slot, slot);
+			khugepaged_scan.mm_slot = slot;
 			khugepaged_scan.address = 0;
 		} else {
 			khugepaged_scan.mm_slot = NULL;
 			khugepaged_full_scans++;
 		}
 
-		collect_mm_slot(mm_slot);
+		collect_mm_slot(slot);
 	}
 
 	return progress;
@@ -2600,7 +2585,7 @@ static void khugepaged_wait_work(void)
 
 static int khugepaged(void *none)
 {
-	struct khugepaged_mm_slot *mm_slot;
+	struct mm_slot *slot;
 
 	set_freezable();
 	set_user_nice(current, MAX_NICE);
@@ -2611,10 +2596,10 @@ static int khugepaged(void *none)
 	}
 
 	spin_lock(&khugepaged_mm_lock);
-	mm_slot = khugepaged_scan.mm_slot;
+	slot = khugepaged_scan.mm_slot;
 	khugepaged_scan.mm_slot = NULL;
-	if (mm_slot)
-		collect_mm_slot(mm_slot);
+	if (slot)
+		collect_mm_slot(slot);
 	spin_unlock(&khugepaged_mm_lock);
 	return 0;
 }
-- 
2.34.1



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Patch v2 1/2] mm/ksm: get mm_slot by mm_slot_entry() when slot is !NULL
  2025-09-19  7:12 ` [Patch v2 1/2] mm/ksm: get mm_slot by mm_slot_entry() when slot is !NULL Wei Yang
@ 2025-09-19  7:24   ` David Hildenbrand
  2025-09-19  7:38   ` Dev Jain
  2025-09-19  7:44   ` Lance Yang
  2 siblings, 0 replies; 16+ messages in thread
From: David Hildenbrand @ 2025-09-19  7:24 UTC (permalink / raw)
  To: Wei Yang, akpm, lorenzo.stoakes, ziy, baolin.wang, Liam.Howlett,
	npache, ryan.roberts, dev.jain, baohua, lance.yang, xu.xin16
  Cc: linux-mm, Kiryl Shutsemau

On 19.09.25 09:12, Wei Yang wrote:
> When using mm_slot in ksm, there is code snip like:
> 
>       slot = mm_slot_lookup(mm_slots_hash, mm);
>       mm_slot = mm_slot_entry(slot, struct ksm_mm_slot, slot);
>       if (mm_slot && ..) {
>       }
> 
> The mm_slot_entry() won't return a valid value if slot is NULL
> generally. But currently it works since slot is the first element of
> struct ksm_mm_slot.
> 
> To reduce the ambiguity and make it robust, access mm_slot_entry() when
> slot is !NULL.
> 
> Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
> Cc: Lance Yang <lance.yang@linux.dev>
> Cc: David Hildenbrand <david@redhat.com>
> Cc: Dev Jain <dev.jain@arm.com>
> Cc: Kiryl Shutsemau <kirill@shutemov.name>
> Cc: xu.xin16@zte.com.cn
> 
> ---
> v2:
>    * check on slot before continue
> ---

As discussed, we might want to do some other cleanups in that areas 
(e.g., slot never being NULL?). But this here looks cleaner to me for 
the time being.

Acked-by: David Hildenbrand <david@redhat.com>

-- 
Cheers

David / dhildenb



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Patch v2 2/2] mm/khugepaged: remove definition of struct khugepaged_mm_slot
  2025-09-19  7:12 ` [Patch v2 2/2] mm/khugepaged: remove definition of struct khugepaged_mm_slot Wei Yang
@ 2025-09-19  7:36   ` David Hildenbrand
  2025-09-22 13:17     ` Nico Pache
  2025-09-20 11:52   ` SeongJae Park
  1 sibling, 1 reply; 16+ messages in thread
From: David Hildenbrand @ 2025-09-19  7:36 UTC (permalink / raw)
  To: Wei Yang, akpm, lorenzo.stoakes, ziy, baolin.wang, Liam.Howlett,
	npache, ryan.roberts, dev.jain, baohua, lance.yang, xu.xin16,
	Nico Pache
  Cc: linux-mm, Kiryl Shutsemau

On 19.09.25 09:12, Wei Yang wrote:
> Current code is not correct to get struct khugepaged_mm_slot by
> mm_slot_entry() without checking mm_slot is !NULL. There is no problem
> reported since slot is the first element of struct khugepaged_mm_slot.
> 
> While struct khugepaged_mm_slot is just a wrapper of struct mm_slot,
> there is no need to define it.
> 
> Remove the definition of struct khugepaged_mm_slot, so there is not
> chance to miss use mm_slot_entry().
> 
> Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
> Cc: Lance Yang <lance.yang@linux.dev>
> Cc: David Hildenbrand <david@redhat.com>
> Cc: Dev Jain <dev.jain@arm.com>
> Cc: Kiryl Shutsemau <kirill@shutemov.name>
> Cc: xu.xin16@zte.com.cn
> ---
>   mm/khugepaged.c | 57 ++++++++++++++++++-------------------------------
>   1 file changed, 21 insertions(+), 36 deletions(-)
> 
> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> index e019ea2cbab0..88ea92c64bf0 100644
> --- a/mm/khugepaged.c
> +++ b/mm/khugepaged.c
> @@ -103,14 +103,6 @@ struct collapse_control {
>   	nodemask_t alloc_nmask;
>   };
>   
> -/**
> - * struct khugepaged_mm_slot - khugepaged information per mm that is being scanned
> - * @slot: hash lookup from mm to mm_slot
> - */
> -struct khugepaged_mm_slot {
> -	struct mm_slot slot;
> -};
> -

Looking into the details, we remove the last entries from this member in 
d50791c2bee9 ("mm/khugepaged: delete 
khugepaged_collapse_pte_mapped_thps()").

@Nico did you have any use case in one of your scanning-optimizing 
prototypes for khugepaged_mm_slot?

-- 
Cheers

David / dhildenb



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Patch v2 1/2] mm/ksm: get mm_slot by mm_slot_entry() when slot is !NULL
  2025-09-19  7:12 ` [Patch v2 1/2] mm/ksm: get mm_slot by mm_slot_entry() when slot is !NULL Wei Yang
  2025-09-19  7:24   ` David Hildenbrand
@ 2025-09-19  7:38   ` Dev Jain
  2025-09-19  7:44   ` Lance Yang
  2 siblings, 0 replies; 16+ messages in thread
From: Dev Jain @ 2025-09-19  7:38 UTC (permalink / raw)
  To: Wei Yang, akpm, david, lorenzo.stoakes, ziy, baolin.wang,
	Liam.Howlett, npache, ryan.roberts, baohua, lance.yang, xu.xin16
  Cc: linux-mm, Kiryl Shutsemau


On 19/09/25 12:42 pm, Wei Yang wrote:
> When using mm_slot in ksm, there is code snip like:
>
>       slot = mm_slot_lookup(mm_slots_hash, mm);
>       mm_slot = mm_slot_entry(slot, struct ksm_mm_slot, slot);
>       if (mm_slot && ..) {
>       }
>
> The mm_slot_entry() won't return a valid value if slot is NULL
> generally. But currently it works since slot is the first element of
> struct ksm_mm_slot.
>
> To reduce the ambiguity and make it robust, access mm_slot_entry() when
> slot is !NULL.
>
> Signed-off-by: Wei Yang <richard.weiyang@gmail.com>

Reviewed-by: Dev Jain <dev.jain@arm.com>



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Patch v2 1/2] mm/ksm: get mm_slot by mm_slot_entry() when slot is !NULL
  2025-09-19  7:12 ` [Patch v2 1/2] mm/ksm: get mm_slot by mm_slot_entry() when slot is !NULL Wei Yang
  2025-09-19  7:24   ` David Hildenbrand
  2025-09-19  7:38   ` Dev Jain
@ 2025-09-19  7:44   ` Lance Yang
  2 siblings, 0 replies; 16+ messages in thread
From: Lance Yang @ 2025-09-19  7:44 UTC (permalink / raw)
  To: Wei Yang
  Cc: linux-mm, Kiryl Shutsemau, akpm, david, dev.jain, ziy, xu.xin16,
	baolin.wang, Liam.Howlett, lorenzo.stoakes, ryan.roberts, baohua,
	npache



On 2025/9/19 15:12, Wei Yang wrote:
> When using mm_slot in ksm, there is code snip like:
> 
>       slot = mm_slot_lookup(mm_slots_hash, mm);
>       mm_slot = mm_slot_entry(slot, struct ksm_mm_slot, slot);
>       if (mm_slot && ..) {
>       }
> 
> The mm_slot_entry() won't return a valid value if slot is NULL
> generally. But currently it works since slot is the first element of
> struct ksm_mm_slot.
> 
> To reduce the ambiguity and make it robust, access mm_slot_entry() when
> slot is !NULL.
> 
> Signed-off-by: Wei Yang <richard.weiyang@gmail.com>

LGTM.

Reviewed-by: Lance Yang <lance.yang@linux.dev>

> Cc: Lance Yang <lance.yang@linux.dev>
> Cc: David Hildenbrand <david@redhat.com>
> Cc: Dev Jain <dev.jain@arm.com>
> Cc: Kiryl Shutsemau <kirill@shutemov.name>
> Cc: xu.xin16@zte.com.cn
> 
> ---
> v2:
>    * check on slot before continue
> ---
>   mm/ksm.c | 20 +++++++++++---------
>   1 file changed, 11 insertions(+), 9 deletions(-)
> 
> diff --git a/mm/ksm.c b/mm/ksm.c
> index 2ef29802a49b..6ff7f840e50a 100644
> --- a/mm/ksm.c
> +++ b/mm/ksm.c
> @@ -2939,15 +2939,17 @@ void __ksm_exit(struct mm_struct *mm)
>   
>   	spin_lock(&ksm_mmlist_lock);
>   	slot = mm_slot_lookup(mm_slots_hash, mm);
> -	mm_slot = mm_slot_entry(slot, struct ksm_mm_slot, slot);
> -	if (mm_slot && ksm_scan.mm_slot != mm_slot) {
> -		if (!mm_slot->rmap_list) {
> -			hash_del(&slot->hash);
> -			list_del(&slot->mm_node);
> -			easy_to_free = 1;
> -		} else {
> -			list_move(&slot->mm_node,
> -				  &ksm_scan.mm_slot->slot.mm_node);
> +	if (slot) {
> +		mm_slot = mm_slot_entry(slot, struct ksm_mm_slot, slot);
> +		if (ksm_scan.mm_slot != mm_slot) {
> +			if (!mm_slot->rmap_list) {
> +				hash_del(&slot->hash);
> +				list_del(&slot->mm_node);
> +				easy_to_free = 1;
> +			} else {
> +				list_move(&slot->mm_node,
> +					  &ksm_scan.mm_slot->slot.mm_node);
> +			}
>   		}
>   	}
>   	spin_unlock(&ksm_mmlist_lock);



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Patch v2 2/2] mm/khugepaged: remove definition of struct khugepaged_mm_slot
  2025-09-19  7:12 ` [Patch v2 2/2] mm/khugepaged: remove definition of struct khugepaged_mm_slot Wei Yang
  2025-09-19  7:36   ` David Hildenbrand
@ 2025-09-20 11:52   ` SeongJae Park
  2025-09-20 12:29     ` Wei Yang
  2025-09-21 16:07     ` Lance Yang
  1 sibling, 2 replies; 16+ messages in thread
From: SeongJae Park @ 2025-09-20 11:52 UTC (permalink / raw)
  To: Wei Yang
  Cc: SeongJae Park, akpm, david, lorenzo.stoakes, ziy, baolin.wang,
	Liam.Howlett, npache, ryan.roberts, dev.jain, baohua, lance.yang,
	xu.xin16, linux-mm, Kiryl Shutsemau

Hello,

On Fri, 19 Sep 2025 07:12:44 +0000 Wei Yang <richard.weiyang@gmail.com> wrote:

> Current code is not correct to get struct khugepaged_mm_slot by
> mm_slot_entry() without checking mm_slot is !NULL. There is no problem
> reported since slot is the first element of struct khugepaged_mm_slot.
> 
> While struct khugepaged_mm_slot is just a wrapper of struct mm_slot,
> there is no need to define it.
> 
> Remove the definition of struct khugepaged_mm_slot, so there is not
> chance to miss use mm_slot_entry().
> 
> Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
> Cc: Lance Yang <lance.yang@linux.dev>
> Cc: David Hildenbrand <david@redhat.com>
> Cc: Dev Jain <dev.jain@arm.com>
> Cc: Kiryl Shutsemau <kirill@shutemov.name>
> Cc: xu.xin16@zte.com.cn
> ---
>  mm/khugepaged.c | 57 ++++++++++++++++++-------------------------------
>  1 file changed, 21 insertions(+), 36 deletions(-)
> 
> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> index e019ea2cbab0..88ea92c64bf0 100644
> --- a/mm/khugepaged.c
> +++ b/mm/khugepaged.c
[...]
> @@ -2376,7 +2365,6 @@ static unsigned int khugepaged_scan_mm_slot(unsigned int pages, int *result,
>  	__acquires(&khugepaged_mm_lock)
>  {
>  	struct vma_iterator vmi;
> -	struct khugepaged_mm_slot *mm_slot;
>  	struct mm_slot *slot;
>  	struct mm_struct *mm;
>  	struct vm_area_struct *vma;
> @@ -2387,14 +2375,12 @@ static unsigned int khugepaged_scan_mm_slot(unsigned int pages, int *result,
>  	*result = SCAN_FAIL;
>  
>  	if (khugepaged_scan.mm_slot) {
> -		mm_slot = khugepaged_scan.mm_slot;
> -		slot = &mm_slot->slot;
> +		slot = khugepaged_scan.mm_slot;
>  	} else {
>  		slot = list_first_entry(&khugepaged_scan.mm_head,
>  				     struct mm_slot, mm_node);
> -		mm_slot = mm_slot_entry(slot, struct khugepaged_mm_slot, slot);
>  		khugepaged_scan.address = 0;
> -		khugepaged_scan.mm_slot = mm_slot;
> +		khugepaged_scan.mm_slot = slot;
>  	}
>  	spin_unlock(&khugepaged_mm_lock);
>  
> @@ -2492,7 +2478,7 @@ static unsigned int khugepaged_scan_mm_slot(unsigned int pages, int *result,
>  breakouterloop_mmap_lock:
>  
>  	spin_lock(&khugepaged_mm_lock);
> -	VM_BUG_ON(khugepaged_scan.mm_slot != mm_slot);
> +	VM_BUG_ON(khugepaged_scan.mm_slot != slot);
>  	/*
>  	 * Release the current mm_slot if this mm is about to die, or
>  	 * if we scanned all vmas of this mm.
> @@ -2505,15 +2491,14 @@ static unsigned int khugepaged_scan_mm_slot(unsigned int pages, int *result,
>  		 */
>  		if (!list_is_last(&slot->mm_node, &khugepaged_scan.mm_head)) {
>  			slot = list_next_entry(slot, mm_node);
> -			khugepaged_scan.mm_slot =
> -				mm_slot_entry(slot, struct khugepaged_mm_slot, slot);
> +			khugepaged_scan.mm_slot = slot;
>  			khugepaged_scan.address = 0;
>  		} else {
>  			khugepaged_scan.mm_slot = NULL;
>  			khugepaged_full_scans++;
>  		}
>  
> -		collect_mm_slot(mm_slot);
> +		collect_mm_slot(slot);
>  	}
>  
>  	return progress;
> @@ -2600,7 +2585,7 @@ static void khugepaged_wait_work(void)
>  
>  static int khugepaged(void *none)
>  {
> -	struct khugepaged_mm_slot *mm_slot;
> +	struct mm_slot *slot;
>  
>  	set_freezable();
>  	set_user_nice(current, MAX_NICE);
> @@ -2611,10 +2596,10 @@ static int khugepaged(void *none)
>  	}
>  
>  	spin_lock(&khugepaged_mm_lock);
> -	mm_slot = khugepaged_scan.mm_slot;
> +	slot = khugepaged_scan.mm_slot;
>  	khugepaged_scan.mm_slot = NULL;
> -	if (mm_slot)
> -		collect_mm_slot(mm_slot);
> +	if (slot)
> +		collect_mm_slot(slot);
>  	spin_unlock(&khugepaged_mm_lock);
>  	return 0;
>  }
> -- 
> 2.34.1
> 
> 
> 

On latest mm-new tree, I am getting below error while building UML mode kernel
for kunit.  And 'git bisect' points me this patch.  I'm not familiar with this
code and have no time to dive deep for now, so reporting first.

Oops: general protection fault, probably for non-canonical adI
[  356.456907] CPU: 34 UID: 0 PID: 309 Comm: khugepaged Not tainted 6.17.0-rc4+ #370 PREEMPT(voluntary)
[  356.457702] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-4.el9 04/01/2014
[  356.458484] RIP: 0010:collect_mm_slot (mm/khugepaged.c:1427)
[  356.458904] Code: 48 89 df 5b e9 1a 29 f3 ff 66 2e 0f 1f 84 00 00 00 00 00 90 90 90 90 90 90 90 90 90 90 908

Code starting with the faulting instruction
===========================================
   0:   48 89 df                mov    %rbx,%rdi
   3:   5b                      pop    %rbx
   4:   e9 1a 29 f3 ff          jmp    0xfffffffffff32923
   9:   66 2e 0f 1f 84 00 00    cs nopw 0x0(%rax,%rax,1)
  10:   00 00 00
  13:   90                      nop
  14:   90                      nop
  15:   90                      nop
  16:   90                      nop
  17:   90                      nop
  18:   90                      nop
  19:   90                      nop
  1a:   90                      nop
  1b:   90                      nop
  1c:   90                      nop
  1d:   08                      .byte 0x8
[  356.460685] RSP: 0018:ffffb61a46e37df8 EFLAGS: 00010286
[  356.461115] RAX: e1bca96613f6fe2b RBX: 0000000000000000 RCX: 8000000000000007
[  356.461692] RDX: 0000000000000001 RSI: ffffeba0443b2600 RDI: e1bca96613f6fe2b
[  356.462269] RBP: 00000000000000f2 R08: ffff8ea80ec9aa00 R09: 0000000080150001
[  356.462842] R10: 000000008015000e R11: 0000000000000000 R12: ffff8ea80ec9aa00
[  356.463574] R13: 00000000000001e5 R14: 0000000000000001 R15: ffffb61a46e37e60
[  356.464249] FS:  0000000000000000(0000) GS:ffff8eaf13dd1000(0000) knlGS:0000000000000000
[  356.465070] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  356.465578] CR2: 00007f8e913e2a1c CR3: 00000008cb022000 CR4: 00000000000006f0
[  356.466185] Call Trace:
[  356.466398]  <TASK>
[  356.466576]  khugepaged (mm/khugepaged.c:2519 mm/khugepaged.c:2556 mm/khugepaged.c:2612)
[  356.466869]  ? __pfx_khugepaged (mm/khugepaged.c:2605)
[  356.467284]  kthread (kernel/kthread.c:463)
[  356.467592]  ? finish_task_switch.isra.0 (arch/x86/include/asm/paravirt.h:671 kernel/sched/sched.h:1531 kernel/sched/core.c:5105 kernel/sched/core.c:5223)
[  356.468068]  ? __pfx_kthread (kernel/kthread.c:412)
[  356.468480]  ret_from_fork (arch/x86/kernel/process.c:154)
[  356.468849]  ? __pfx_kthread (kernel/kthread.c:412)
[  356.469223]  ret_from_fork_asm (arch/x86/entry/entry_64.S:258)
[  356.469591]  </TASK>
[  356.469778] Modules linked in: binfmt_misc ppdev parport_pc parport pcspkr evdev joydev button serio_raw sgn
[  356.473304] Dumping ftrace buffer:
[  356.473618]    (ftrace buffer empty)
[  356.473966] ---[ end trace 0000000000000000 ]---
[  356.474506] RIP: 0010:collect_mm_slot (mm/khugepaged.c:1427)
[  356.475142] Code: 48 89 df 5b e9 1a 29 f3 ff 66 2e 0f 1f 84 00 00 00 00 00 90 90 90 90 90 90 90 90 90 90 908

Code starting with the faulting instruction
===========================================
   0:   48 89 df                mov    %rbx,%rdi
   3:   5b                      pop    %rbx
   4:   e9 1a 29 f3 ff          jmp    0xfffffffffff32923
   9:   66 2e 0f 1f 84 00 00    cs nopw 0x0(%rax,%rax,1)
  10:   00 00 00
  13:   90                      nop
  14:   90                      nop
  15:   90                      nop
  16:   90                      nop
  17:   90                      nop
  18:   90                      nop
  19:   90                      nop
  1a:   90                      nop
  1b:   90                      nop
  1c:   90                      nop
  1d:   08                      .byte 0x8
[  356.478405] RSP: 0018:ffffb61a46e37df8 EFLAGS: 00010286
[  356.478935] RAX: e1bca96613f6fe2b RBX: 0000000000000000 RCX: 8000000000000007
[  356.479763] RDX: 0000000000000001 RSI: ffffeba0443b2600 RDI: e1bca96613f6fe2b
[  356.480722] RBP: 00000000000000f2 R08: ffff8ea80ec9aa00 R09: 0000000080150001
[  356.481703] R10: 000000008015000e R11: 0000000000000000 R12: ffff8ea80ec9aa00
[  356.482402] R13: 00000000000001e5 R14: 0000000000000001 R15: ffffb61a46e37e60
[  356.483060] FS:  0000000000000000(0000) GS:ffff8eaf13dd1000(0000) knlGS:0000000000000000
[  356.484027] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  356.484861] CR2: 00007f8e913e2a1c CR3: 00000008cb022000 CR4: 00000000000006f0
[  356.485559] note: khugepaged[309] exited with preempt_count 1


Thanks,
SJ


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Patch v2 2/2] mm/khugepaged: remove definition of struct khugepaged_mm_slot
  2025-09-20 11:52   ` SeongJae Park
@ 2025-09-20 12:29     ` Wei Yang
  2025-09-20 13:41       ` SeongJae Park
  2025-09-21 16:07     ` Lance Yang
  1 sibling, 1 reply; 16+ messages in thread
From: Wei Yang @ 2025-09-20 12:29 UTC (permalink / raw)
  To: SeongJae Park
  Cc: Wei Yang, akpm, david, lorenzo.stoakes, ziy, baolin.wang,
	Liam.Howlett, npache, ryan.roberts, dev.jain, baohua, lance.yang,
	xu.xin16, linux-mm, Kiryl Shutsemau

On Sat, Sep 20, 2025 at 04:52:33AM -0700, SeongJae Park wrote:
[...]
>
>On latest mm-new tree, I am getting below error while building UML mode kernel
>for kunit.  And 'git bisect' points me this patch.  I'm not familiar with this
>code and have no time to dive deep for now, so reporting first.

Thanks for reporting.

>
>Oops: general protection fault, probably for non-canonical adI
>[  356.456907] CPU: 34 UID: 0 PID: 309 Comm: khugepaged Not tainted 6.17.0-rc4+ #370 PREEMPT(voluntary)
>[  356.457702] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-4.el9 04/01/2014
>[  356.458484] RIP: 0010:collect_mm_slot (mm/khugepaged.c:1427)

If my understanding is correct, the error happens in following code flow:

  khugepaged_scan_mm_slot()
    mm = slot->mm;                (1)
    collect_mm_slot()
      mm = slot->mm;              (2)

Looks the reason is slot is NULL at (2), but we have already accessed it at (1).

Not find the cause yet.

Would you mind sharing your UML config and test step? So that I can reproduce
it.

>[  356.458904] Code: 48 89 df 5b e9 1a 29 f3 ff 66 2e 0f 1f 84 00 00 00 00 00 90 90 90 90 90 90 90 90 90 90 908
>
>Code starting with the faulting instruction
>===========================================
>   0:   48 89 df                mov    %rbx,%rdi
>   3:   5b                      pop    %rbx
>   4:   e9 1a 29 f3 ff          jmp    0xfffffffffff32923
>   9:   66 2e 0f 1f 84 00 00    cs nopw 0x0(%rax,%rax,1)
>  10:   00 00 00
>  13:   90                      nop
>  14:   90                      nop
>  15:   90                      nop
>  16:   90                      nop
>  17:   90                      nop
>  18:   90                      nop
>  19:   90                      nop
>  1a:   90                      nop
>  1b:   90                      nop
>  1c:   90                      nop
>  1d:   08                      .byte 0x8
>[  356.460685] RSP: 0018:ffffb61a46e37df8 EFLAGS: 00010286
>[  356.461115] RAX: e1bca96613f6fe2b RBX: 0000000000000000 RCX: 8000000000000007
>[  356.461692] RDX: 0000000000000001 RSI: ffffeba0443b2600 RDI: e1bca96613f6fe2b
>[  356.462269] RBP: 00000000000000f2 R08: ffff8ea80ec9aa00 R09: 0000000080150001
>[  356.462842] R10: 000000008015000e R11: 0000000000000000 R12: ffff8ea80ec9aa00
>[  356.463574] R13: 00000000000001e5 R14: 0000000000000001 R15: ffffb61a46e37e60
>[  356.464249] FS:  0000000000000000(0000) GS:ffff8eaf13dd1000(0000) knlGS:0000000000000000
>[  356.465070] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>[  356.465578] CR2: 00007f8e913e2a1c CR3: 00000008cb022000 CR4: 00000000000006f0
>[  356.466185] Call Trace:
>[  356.466398]  <TASK>
>[  356.466576]  khugepaged (mm/khugepaged.c:2519 mm/khugepaged.c:2556 mm/khugepaged.c:2612)
>[  356.466869]  ? __pfx_khugepaged (mm/khugepaged.c:2605)
>[  356.467284]  kthread (kernel/kthread.c:463)
>[  356.467592]  ? finish_task_switch.isra.0 (arch/x86/include/asm/paravirt.h:671 kernel/sched/sched.h:1531 kernel/sched/core.c:5105 kernel/sched/core.c:5223)
>[  356.468068]  ? __pfx_kthread (kernel/kthread.c:412)
>[  356.468480]  ret_from_fork (arch/x86/kernel/process.c:154)
>[  356.468849]  ? __pfx_kthread (kernel/kthread.c:412)
>[  356.469223]  ret_from_fork_asm (arch/x86/entry/entry_64.S:258)
>[  356.469591]  </TASK>
>[  356.469778] Modules linked in: binfmt_misc ppdev parport_pc parport pcspkr evdev joydev button serio_raw sgn
>[  356.473304] Dumping ftrace buffer:
>[  356.473618]    (ftrace buffer empty)
>[  356.473966] ---[ end trace 0000000000000000 ]---
>[  356.474506] RIP: 0010:collect_mm_slot (mm/khugepaged.c:1427)
>[  356.475142] Code: 48 89 df 5b e9 1a 29 f3 ff 66 2e 0f 1f 84 00 00 00 00 00 90 90 90 90 90 90 90 90 90 90 908
>
>Code starting with the faulting instruction
>===========================================
>   0:   48 89 df                mov    %rbx,%rdi
>   3:   5b                      pop    %rbx
>   4:   e9 1a 29 f3 ff          jmp    0xfffffffffff32923
>   9:   66 2e 0f 1f 84 00 00    cs nopw 0x0(%rax,%rax,1)
>  10:   00 00 00
>  13:   90                      nop
>  14:   90                      nop
>  15:   90                      nop
>  16:   90                      nop
>  17:   90                      nop
>  18:   90                      nop
>  19:   90                      nop
>  1a:   90                      nop
>  1b:   90                      nop
>  1c:   90                      nop
>  1d:   08                      .byte 0x8
>[  356.478405] RSP: 0018:ffffb61a46e37df8 EFLAGS: 00010286
>[  356.478935] RAX: e1bca96613f6fe2b RBX: 0000000000000000 RCX: 8000000000000007
>[  356.479763] RDX: 0000000000000001 RSI: ffffeba0443b2600 RDI: e1bca96613f6fe2b
>[  356.480722] RBP: 00000000000000f2 R08: ffff8ea80ec9aa00 R09: 0000000080150001
>[  356.481703] R10: 000000008015000e R11: 0000000000000000 R12: ffff8ea80ec9aa00
>[  356.482402] R13: 00000000000001e5 R14: 0000000000000001 R15: ffffb61a46e37e60
>[  356.483060] FS:  0000000000000000(0000) GS:ffff8eaf13dd1000(0000) knlGS:0000000000000000
>[  356.484027] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>[  356.484861] CR2: 00007f8e913e2a1c CR3: 00000008cb022000 CR4: 00000000000006f0
>[  356.485559] note: khugepaged[309] exited with preempt_count 1
>
>
>Thanks,
>SJ

-- 
Wei Yang
Help you, Help me


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Patch v2 2/2] mm/khugepaged: remove definition of struct khugepaged_mm_slot
  2025-09-20 12:29     ` Wei Yang
@ 2025-09-20 13:41       ` SeongJae Park
  2025-09-21 15:08         ` Wei Yang
  0 siblings, 1 reply; 16+ messages in thread
From: SeongJae Park @ 2025-09-20 13:41 UTC (permalink / raw)
  To: Wei Yang
  Cc: SeongJae Park, akpm, david, lorenzo.stoakes, ziy, baolin.wang,
	Liam.Howlett, npache, ryan.roberts, dev.jain, baohua, lance.yang,
	xu.xin16, linux-mm, Kiryl Shutsemau

On Sat, 20 Sep 2025 12:29:58 +0000 Wei Yang <richard.weiyang@gmail.com> wrote:

> On Sat, Sep 20, 2025 at 04:52:33AM -0700, SeongJae Park wrote:
> [...]
> >
> >On latest mm-new tree, I am getting below error while building UML mode kernel
> >for kunit.  And 'git bisect' points me this patch.  I'm not familiar with this
> >code and have no time to dive deep for now, so reporting first.
> 
> Thanks for reporting.

Thank you for fast reply!

> 
> >
> >Oops: general protection fault, probably for non-canonical adI
> >[  356.456907] CPU: 34 UID: 0 PID: 309 Comm: khugepaged Not tainted 6.17.0-rc4+ #370 PREEMPT(voluntary)
> >[  356.457702] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-4.el9 04/01/2014
> >[  356.458484] RIP: 0010:collect_mm_slot (mm/khugepaged.c:1427)
> 
> If my understanding is correct, the error happens in following code flow:
> 
>   khugepaged_scan_mm_slot()
>     mm = slot->mm;                (1)
>     collect_mm_slot()
>       mm = slot->mm;              (2)
> 
> Looks the reason is slot is NULL at (2), but we have already accessed it at (1).
> 
> Not find the cause yet.
> 
> Would you mind sharing your UML config and test step? So that I can reproduce
> it.

To reproduce the issue, I do below:

    $ ./tools/testing/kunit/kunit.py run --kunitconfig mm/damon/tests

Then the OOPs happens a few seconds aftr the kunit run, or during the
kunit-purpose UML build.  The issue sometimes doesn't happen or take time more
than seconds.  Since the code is related with khugepaged, apparently there are
some timing factors.  But I was able to reproduce about 3/4 times.

It was also reproducible when I build and install kernel on my QEMU machine
using 'make install'.


Thanks,
SJ

[...]


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Patch v2 2/2] mm/khugepaged: remove definition of struct khugepaged_mm_slot
  2025-09-20 13:41       ` SeongJae Park
@ 2025-09-21 15:08         ` Wei Yang
  2025-09-22  9:33           ` SeongJae Park
  0 siblings, 1 reply; 16+ messages in thread
From: Wei Yang @ 2025-09-21 15:08 UTC (permalink / raw)
  To: SeongJae Park
  Cc: Wei Yang, akpm, david, lorenzo.stoakes, ziy, baolin.wang,
	Liam.Howlett, npache, ryan.roberts, dev.jain, baohua, lance.yang,
	xu.xin16, linux-mm, Kiryl Shutsemau

On Sat, Sep 20, 2025 at 06:41:30AM -0700, SeongJae Park wrote:
>On Sat, 20 Sep 2025 12:29:58 +0000 Wei Yang <richard.weiyang@gmail.com> wrote:
>
>> On Sat, Sep 20, 2025 at 04:52:33AM -0700, SeongJae Park wrote:
>> [...]
>> >
>> >On latest mm-new tree, I am getting below error while building UML mode kernel
>> >for kunit.  And 'git bisect' points me this patch.  I'm not familiar with this
>> >code and have no time to dive deep for now, so reporting first.
>> 
>> Thanks for reporting.
>
>Thank you for fast reply!
>
>> 
>> >
>> >Oops: general protection fault, probably for non-canonical adI
>> >[  356.456907] CPU: 34 UID: 0 PID: 309 Comm: khugepaged Not tainted 6.17.0-rc4+ #370 PREEMPT(voluntary)
>> >[  356.457702] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-4.el9 04/01/2014
>> >[  356.458484] RIP: 0010:collect_mm_slot (mm/khugepaged.c:1427)
>> 
>> If my understanding is correct, the error happens in following code flow:
>> 
>>   khugepaged_scan_mm_slot()
>>     mm = slot->mm;                (1)
>>     collect_mm_slot()
>>       mm = slot->mm;              (2)
>> 
>> Looks the reason is slot is NULL at (2), but we have already accessed it at (1).
>> 
>> Not find the cause yet.
>> 
>> Would you mind sharing your UML config and test step? So that I can reproduce
>> it.
>
>To reproduce the issue, I do below:
>
>    $ ./tools/testing/kunit/kunit.py run --kunitconfig mm/damon/tests
>
>Then the OOPs happens a few seconds aftr the kunit run, or during the
>kunit-purpose UML build.  The issue sometimes doesn't happen or take time more
>than seconds.  Since the code is related with khugepaged, apparently there are
>some timing factors.  But I was able to reproduce about 3/4 times.

I run Qemu with kernel built from mm-new base commit b8086c280108
"drivers/base: move memory_block_add_nid() into the caller". Then run the
kunit command above.

Not see error for around 20 times of the test.

Is my step correct?

>
>It was also reproducible when I build and install kernel on my QEMU machine
>using 'make install'.
>

Run qemu with the same kernel and do "make clean && make all" several times.

Not spot errors yet.

kselftests/mm/khugepaged pass with this kernel.

Will do kernel build for more time to see the effect.

If my step is not correct, please let me know.

>
>Thanks,
>SJ
>
>[...]

-- 
Wei Yang
Help you, Help me


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Patch v2 2/2] mm/khugepaged: remove definition of struct khugepaged_mm_slot
  2025-09-20 11:52   ` SeongJae Park
  2025-09-20 12:29     ` Wei Yang
@ 2025-09-21 16:07     ` Lance Yang
  2025-09-22  0:28       ` Wei Yang
  1 sibling, 1 reply; 16+ messages in thread
From: Lance Yang @ 2025-09-21 16:07 UTC (permalink / raw)
  To: Wei Yang, SeongJae Park
  Cc: akpm, david, lorenzo.stoakes, ziy, baolin.wang, Liam.Howlett,
	npache, ryan.roberts, dev.jain, baohua, xu.xin16, linux-mm,
	Kiryl Shutsemau

Good catch!

Looking at the crash report, this seems like a use-after-free bug
introduced in khugepaged_scan_mm_slot(). See below please.

On 2025/9/20 19:52, SeongJae Park wrote:
> Hello,
> 
> On Fri, 19 Sep 2025 07:12:44 +0000 Wei Yang <richard.weiyang@gmail.com> wrote:
> 
>> Current code is not correct to get struct khugepaged_mm_slot by
>> mm_slot_entry() without checking mm_slot is !NULL. There is no problem
>> reported since slot is the first element of struct khugepaged_mm_slot.
>>
>> While struct khugepaged_mm_slot is just a wrapper of struct mm_slot,
>> there is no need to define it.
>>
>> Remove the definition of struct khugepaged_mm_slot, so there is not
>> chance to miss use mm_slot_entry().
>>
>> Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
>> Cc: Lance Yang <lance.yang@linux.dev>
>> Cc: David Hildenbrand <david@redhat.com>
>> Cc: Dev Jain <dev.jain@arm.com>
>> Cc: Kiryl Shutsemau <kirill@shutemov.name>
>> Cc: xu.xin16@zte.com.cn
>> ---
>>   mm/khugepaged.c | 57 ++++++++++++++++++-------------------------------
>>   1 file changed, 21 insertions(+), 36 deletions(-)
>>
>> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
>> index e019ea2cbab0..88ea92c64bf0 100644
>> --- a/mm/khugepaged.c
>> +++ b/mm/khugepaged.c
> [...]
>> @@ -2376,7 +2365,6 @@ static unsigned int khugepaged_scan_mm_slot(unsigned int pages, int *result,
>>   	__acquires(&khugepaged_mm_lock)
>>   {
>>   	struct vma_iterator vmi;
>> -	struct khugepaged_mm_slot *mm_slot;
>>   	struct mm_slot *slot;
>>   	struct mm_struct *mm;
>>   	struct vm_area_struct *vma;
>> @@ -2387,14 +2375,12 @@ static unsigned int khugepaged_scan_mm_slot(unsigned int pages, int *result,
>>   	*result = SCAN_FAIL;
>>   
>>   	if (khugepaged_scan.mm_slot) {
>> -		mm_slot = khugepaged_scan.mm_slot;
>> -		slot = &mm_slot->slot;
>> +		slot = khugepaged_scan.mm_slot;
>>   	} else {
>>   		slot = list_first_entry(&khugepaged_scan.mm_head,
>>   				     struct mm_slot, mm_node);
>> -		mm_slot = mm_slot_entry(slot, struct khugepaged_mm_slot, slot);
>>   		khugepaged_scan.address = 0;
>> -		khugepaged_scan.mm_slot = mm_slot;
>> +		khugepaged_scan.mm_slot = slot;
>>   	}
>>   	spin_unlock(&khugepaged_mm_lock);
>>   
>> @@ -2492,7 +2478,7 @@ static unsigned int khugepaged_scan_mm_slot(unsigned int pages, int *result,
>>   breakouterloop_mmap_lock:
>>   
>>   	spin_lock(&khugepaged_mm_lock);
>> -	VM_BUG_ON(khugepaged_scan.mm_slot != mm_slot);
>> +	VM_BUG_ON(khugepaged_scan.mm_slot != slot);
>>   	/*
>>   	 * Release the current mm_slot if this mm is about to die, or
>>   	 * if we scanned all vmas of this mm.
>> @@ -2505,15 +2491,14 @@ static unsigned int khugepaged_scan_mm_slot(unsigned int pages, int *result,
>>   		 */
>>   		if (!list_is_last(&slot->mm_node, &khugepaged_scan.mm_head)) {
>>   			slot = list_next_entry(slot, mm_node);

In the original code, we used two distinct local variables.

1) struct khugepaged_mm_slot *mm_slot:
mm_slot consistently pointed to the item being processed in the
current call.

2) struct mm_slot *slot:
The local slot pointer could be advanced to the next item.

>> -			khugepaged_scan.mm_slot =
>> -				mm_slot_entry(slot, struct khugepaged_mm_slot, slot);
>> +			khugepaged_scan.mm_slot = slot;
>>   			khugepaged_scan.address = 0;
>>   		} else {
>>   			khugepaged_scan.mm_slot = NULL;
>>   			khugepaged_full_scans++;
>>   		}
>>   
>> -		collect_mm_slot(mm_slot);

At the end, collect_mm_slot(mm_slot) correctly operated on the
original item for that scan.

>> +		collect_mm_slot(slot);

However, this patch merges these two into a single slot variable.

When slot = list_next_entry(slot, mm_node); is called, the slot
pointer is updated to the next item.

Passing this new pointer to collect_mm_slot() then causes a
use-after-free on the following iteration, IIUC.

Cheers,
Lance



>>   	}
>>   
>>   	return progress;
>> @@ -2600,7 +2585,7 @@ static void khugepaged_wait_work(void)
>>   
>>   static int khugepaged(void *none)
>>   {
>> -	struct khugepaged_mm_slot *mm_slot;
>> +	struct mm_slot *slot;
>>   
>>   	set_freezable();
>>   	set_user_nice(current, MAX_NICE);
>> @@ -2611,10 +2596,10 @@ static int khugepaged(void *none)
>>   	}
>>   
>>   	spin_lock(&khugepaged_mm_lock);
>> -	mm_slot = khugepaged_scan.mm_slot;
>> +	slot = khugepaged_scan.mm_slot;
>>   	khugepaged_scan.mm_slot = NULL;
>> -	if (mm_slot)
>> -		collect_mm_slot(mm_slot);
>> +	if (slot)
>> +		collect_mm_slot(slot);
>>   	spin_unlock(&khugepaged_mm_lock);
>>   	return 0;
>>   }
>> -- 
>> 2.34.1
>>
>>
>>
> 
> On latest mm-new tree, I am getting below error while building UML mode kernel
> for kunit.  And 'git bisect' points me this patch.  I'm not familiar with this
> code and have no time to dive deep for now, so reporting first.
> 
> Oops: general protection fault, probably for non-canonical adI
> [  356.456907] CPU: 34 UID: 0 PID: 309 Comm: khugepaged Not tainted 6.17.0-rc4+ #370 PREEMPT(voluntary)
> [  356.457702] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-4.el9 04/01/2014
> [  356.458484] RIP: 0010:collect_mm_slot (mm/khugepaged.c:1427)
> [  356.458904] Code: 48 89 df 5b e9 1a 29 f3 ff 66 2e 0f 1f 84 00 00 00 00 00 90 90 90 90 90 90 90 90 90 90 908
> 
> Code starting with the faulting instruction
> ===========================================
>     0:   48 89 df                mov    %rbx,%rdi
>     3:   5b                      pop    %rbx
>     4:   e9 1a 29 f3 ff          jmp    0xfffffffffff32923
>     9:   66 2e 0f 1f 84 00 00    cs nopw 0x0(%rax,%rax,1)
>    10:   00 00 00
>    13:   90                      nop
>    14:   90                      nop
>    15:   90                      nop
>    16:   90                      nop
>    17:   90                      nop
>    18:   90                      nop
>    19:   90                      nop
>    1a:   90                      nop
>    1b:   90                      nop
>    1c:   90                      nop
>    1d:   08                      .byte 0x8
> [  356.460685] RSP: 0018:ffffb61a46e37df8 EFLAGS: 00010286
> [  356.461115] RAX: e1bca96613f6fe2b RBX: 0000000000000000 RCX: 8000000000000007
> [  356.461692] RDX: 0000000000000001 RSI: ffffeba0443b2600 RDI: e1bca96613f6fe2b
> [  356.462269] RBP: 00000000000000f2 R08: ffff8ea80ec9aa00 R09: 0000000080150001
> [  356.462842] R10: 000000008015000e R11: 0000000000000000 R12: ffff8ea80ec9aa00
> [  356.463574] R13: 00000000000001e5 R14: 0000000000000001 R15: ffffb61a46e37e60
> [  356.464249] FS:  0000000000000000(0000) GS:ffff8eaf13dd1000(0000) knlGS:0000000000000000
> [  356.465070] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  356.465578] CR2: 00007f8e913e2a1c CR3: 00000008cb022000 CR4: 00000000000006f0
> [  356.466185] Call Trace:
> [  356.466398]  <TASK>
> [  356.466576]  khugepaged (mm/khugepaged.c:2519 mm/khugepaged.c:2556 mm/khugepaged.c:2612)
> [  356.466869]  ? __pfx_khugepaged (mm/khugepaged.c:2605)
> [  356.467284]  kthread (kernel/kthread.c:463)
> [  356.467592]  ? finish_task_switch.isra.0 (arch/x86/include/asm/paravirt.h:671 kernel/sched/sched.h:1531 kernel/sched/core.c:5105 kernel/sched/core.c:5223)
> [  356.468068]  ? __pfx_kthread (kernel/kthread.c:412)
> [  356.468480]  ret_from_fork (arch/x86/kernel/process.c:154)
> [  356.468849]  ? __pfx_kthread (kernel/kthread.c:412)
> [  356.469223]  ret_from_fork_asm (arch/x86/entry/entry_64.S:258)
> [  356.469591]  </TASK>
> [  356.469778] Modules linked in: binfmt_misc ppdev parport_pc parport pcspkr evdev joydev button serio_raw sgn
> [  356.473304] Dumping ftrace buffer:
> [  356.473618]    (ftrace buffer empty)
> [  356.473966] ---[ end trace 0000000000000000 ]---
> [  356.474506] RIP: 0010:collect_mm_slot (mm/khugepaged.c:1427)
> [  356.475142] Code: 48 89 df 5b e9 1a 29 f3 ff 66 2e 0f 1f 84 00 00 00 00 00 90 90 90 90 90 90 90 90 90 90 908
> 
> Code starting with the faulting instruction
> ===========================================
>     0:   48 89 df                mov    %rbx,%rdi
>     3:   5b                      pop    %rbx
>     4:   e9 1a 29 f3 ff          jmp    0xfffffffffff32923
>     9:   66 2e 0f 1f 84 00 00    cs nopw 0x0(%rax,%rax,1)
>    10:   00 00 00
>    13:   90                      nop
>    14:   90                      nop
>    15:   90                      nop
>    16:   90                      nop
>    17:   90                      nop
>    18:   90                      nop
>    19:   90                      nop
>    1a:   90                      nop
>    1b:   90                      nop
>    1c:   90                      nop
>    1d:   08                      .byte 0x8
> [  356.478405] RSP: 0018:ffffb61a46e37df8 EFLAGS: 00010286
> [  356.478935] RAX: e1bca96613f6fe2b RBX: 0000000000000000 RCX: 8000000000000007
> [  356.479763] RDX: 0000000000000001 RSI: ffffeba0443b2600 RDI: e1bca96613f6fe2b
> [  356.480722] RBP: 00000000000000f2 R08: ffff8ea80ec9aa00 R09: 0000000080150001
> [  356.481703] R10: 000000008015000e R11: 0000000000000000 R12: ffff8ea80ec9aa00
> [  356.482402] R13: 00000000000001e5 R14: 0000000000000001 R15: ffffb61a46e37e60
> [  356.483060] FS:  0000000000000000(0000) GS:ffff8eaf13dd1000(0000) knlGS:0000000000000000
> [  356.484027] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  356.484861] CR2: 00007f8e913e2a1c CR3: 00000008cb022000 CR4: 00000000000006f0
> [  356.485559] note: khugepaged[309] exited with preempt_count 1
> 
> 
> Thanks,
> SJ



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Patch v2 2/2] mm/khugepaged: remove definition of struct khugepaged_mm_slot
  2025-09-21 16:07     ` Lance Yang
@ 2025-09-22  0:28       ` Wei Yang
  2025-09-22  9:37         ` SeongJae Park
  0 siblings, 1 reply; 16+ messages in thread
From: Wei Yang @ 2025-09-22  0:28 UTC (permalink / raw)
  To: Lance Yang
  Cc: Wei Yang, SeongJae Park, akpm, david, lorenzo.stoakes, ziy,
	baolin.wang, Liam.Howlett, npache, ryan.roberts, dev.jain,
	baohua, xu.xin16, linux-mm, Kiryl Shutsemau

On Mon, Sep 22, 2025 at 12:07:32AM +0800, Lance Yang wrote:
>Good catch!
>
>Looking at the crash report, this seems like a use-after-free bug
>introduced in khugepaged_scan_mm_slot(). See below please.
>
>On 2025/9/20 19:52, SeongJae Park wrote:
>> Hello,
>> 
>> On Fri, 19 Sep 2025 07:12:44 +0000 Wei Yang <richard.weiyang@gmail.com> wrote:
>> 
>> > Current code is not correct to get struct khugepaged_mm_slot by
>> > mm_slot_entry() without checking mm_slot is !NULL. There is no problem
>> > reported since slot is the first element of struct khugepaged_mm_slot.
>> > 
>> > While struct khugepaged_mm_slot is just a wrapper of struct mm_slot,
>> > there is no need to define it.
>> > 
>> > Remove the definition of struct khugepaged_mm_slot, so there is not
>> > chance to miss use mm_slot_entry().
>> > 
>> > Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
>> > Cc: Lance Yang <lance.yang@linux.dev>
>> > Cc: David Hildenbrand <david@redhat.com>
>> > Cc: Dev Jain <dev.jain@arm.com>
>> > Cc: Kiryl Shutsemau <kirill@shutemov.name>
>> > Cc: xu.xin16@zte.com.cn
>> > ---
>> >   mm/khugepaged.c | 57 ++++++++++++++++++-------------------------------
>> >   1 file changed, 21 insertions(+), 36 deletions(-)
>> > 
>> > diff --git a/mm/khugepaged.c b/mm/khugepaged.c
>> > index e019ea2cbab0..88ea92c64bf0 100644
>> > --- a/mm/khugepaged.c
>> > +++ b/mm/khugepaged.c
>> [...]
>> > @@ -2376,7 +2365,6 @@ static unsigned int khugepaged_scan_mm_slot(unsigned int pages, int *result,
>> >   	__acquires(&khugepaged_mm_lock)
>> >   {
>> >   	struct vma_iterator vmi;
>> > -	struct khugepaged_mm_slot *mm_slot;
>> >   	struct mm_slot *slot;
>> >   	struct mm_struct *mm;
>> >   	struct vm_area_struct *vma;
>> > @@ -2387,14 +2375,12 @@ static unsigned int khugepaged_scan_mm_slot(unsigned int pages, int *result,
>> >   	*result = SCAN_FAIL;
>> >   	if (khugepaged_scan.mm_slot) {
>> > -		mm_slot = khugepaged_scan.mm_slot;
>> > -		slot = &mm_slot->slot;
>> > +		slot = khugepaged_scan.mm_slot;
>> >   	} else {
>> >   		slot = list_first_entry(&khugepaged_scan.mm_head,
>> >   				     struct mm_slot, mm_node);
>> > -		mm_slot = mm_slot_entry(slot, struct khugepaged_mm_slot, slot);
>> >   		khugepaged_scan.address = 0;
>> > -		khugepaged_scan.mm_slot = mm_slot;
>> > +		khugepaged_scan.mm_slot = slot;
>> >   	}
>> >   	spin_unlock(&khugepaged_mm_lock);
>> > @@ -2492,7 +2478,7 @@ static unsigned int khugepaged_scan_mm_slot(unsigned int pages, int *result,
>> >   breakouterloop_mmap_lock:
>> >   	spin_lock(&khugepaged_mm_lock);
>> > -	VM_BUG_ON(khugepaged_scan.mm_slot != mm_slot);
>> > +	VM_BUG_ON(khugepaged_scan.mm_slot != slot);
>> >   	/*
>> >   	 * Release the current mm_slot if this mm is about to die, or
>> >   	 * if we scanned all vmas of this mm.
>> > @@ -2505,15 +2491,14 @@ static unsigned int khugepaged_scan_mm_slot(unsigned int pages, int *result,
>> >   		 */
>> >   		if (!list_is_last(&slot->mm_node, &khugepaged_scan.mm_head)) {
>> >   			slot = list_next_entry(slot, mm_node);
>
>In the original code, we used two distinct local variables.
>
>1) struct khugepaged_mm_slot *mm_slot:
>mm_slot consistently pointed to the item being processed in the
>current call.
>
>2) struct mm_slot *slot:
>The local slot pointer could be advanced to the next item.
>
>> > -			khugepaged_scan.mm_slot =
>> > -				mm_slot_entry(slot, struct khugepaged_mm_slot, slot);
>> > +			khugepaged_scan.mm_slot = slot;
>> >   			khugepaged_scan.address = 0;
>> >   		} else {
>> >   			khugepaged_scan.mm_slot = NULL;
>> >   			khugepaged_full_scans++;
>> >   		}
>> > -		collect_mm_slot(mm_slot);
>
>At the end, collect_mm_slot(mm_slot) correctly operated on the
>original item for that scan.
>
>> > +		collect_mm_slot(slot);
>
>However, this patch merges these two into a single slot variable.
>
>When slot = list_next_entry(slot, mm_node); is called, the slot
>pointer is updated to the next item.
>

Oops, you are right. Thanks for spotting it.

@SeongJae, would you mind applying this change and try again?

diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index d28d1116e83f..fb517b5ad277 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -2508,8 +2508,7 @@ static unsigned int khugepaged_scan_mm_slot(unsigned int pages, int *result,
                 * mm_slot not pointing to the exiting mm.
                 */
                if (!list_is_last(&slot->mm_node, &khugepaged_scan.mm_head)) {
-                       slot = list_next_entry(slot, mm_node);
-                       khugepaged_scan.mm_slot = slot;
+                       khugepaged_scan.mm_slot = list_next_entry(slot, mm_node);
                        khugepaged_scan.address = 0;
                } else {
                        khugepaged_scan.mm_slot = NULL;

>Passing this new pointer to collect_mm_slot() then causes a
>use-after-free on the following iteration, IIUC.
>
>Cheers,
>Lance

-- 
Wei Yang
Help you, Help me


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Patch v2 2/2] mm/khugepaged: remove definition of struct khugepaged_mm_slot
  2025-09-21 15:08         ` Wei Yang
@ 2025-09-22  9:33           ` SeongJae Park
  0 siblings, 0 replies; 16+ messages in thread
From: SeongJae Park @ 2025-09-22  9:33 UTC (permalink / raw)
  To: Wei Yang
  Cc: SeongJae Park, akpm, david, lorenzo.stoakes, ziy, baolin.wang,
	Liam.Howlett, npache, ryan.roberts, dev.jain, baohua, lance.yang,
	xu.xin16, linux-mm, Kiryl Shutsemau

On Sun, 21 Sep 2025 15:08:30 +0000 Wei Yang <richard.weiyang@gmail.com> wrote:

> On Sat, Sep 20, 2025 at 06:41:30AM -0700, SeongJae Park wrote:
> >On Sat, 20 Sep 2025 12:29:58 +0000 Wei Yang <richard.weiyang@gmail.com> wrote:
> >
> >> On Sat, Sep 20, 2025 at 04:52:33AM -0700, SeongJae Park wrote:
[...]
> I run Qemu with kernel built from mm-new base commit b8086c280108
> "drivers/base: move memory_block_add_nid() into the caller". Then run the
> kunit command above.
> 
> Not see error for around 20 times of the test.
> 
> Is my step correct?

That's same to my repro step.

> 
> >
> >It was also reproducible when I build and install kernel on my QEMU machine
> >using 'make install'.
> >
> 
> Run qemu with the same kernel and do "make clean && make all" several times.
> 
> Not spot errors yet.
> 
> kselftests/mm/khugepaged pass with this kernel.
> 
> Will do kernel build for more time to see the effect.
> 
> If my step is not correct, please let me know.

Your steps for kunit and kernel build were same to what I did.  Maybe there are
more factors for more stable repro.  I didn't have time to dig in more, sorry.


Thanks,
SJ

[...]


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Patch v2 2/2] mm/khugepaged: remove definition of struct khugepaged_mm_slot
  2025-09-22  0:28       ` Wei Yang
@ 2025-09-22  9:37         ` SeongJae Park
  0 siblings, 0 replies; 16+ messages in thread
From: SeongJae Park @ 2025-09-22  9:37 UTC (permalink / raw)
  To: Wei Yang
  Cc: SeongJae Park, Lance Yang, akpm, david, lorenzo.stoakes, ziy,
	baolin.wang, Liam.Howlett, npache, ryan.roberts, dev.jain,
	baohua, xu.xin16, linux-mm, Kiryl Shutsemau

On Mon, 22 Sep 2025 00:28:34 +0000 Wei Yang <richard.weiyang@gmail.com> wrote:

> On Mon, Sep 22, 2025 at 12:07:32AM +0800, Lance Yang wrote:
> >Good catch!
> >
> >Looking at the crash report, this seems like a use-after-free bug
> >introduced in khugepaged_scan_mm_slot(). See below please.
> >
> >On 2025/9/20 19:52, SeongJae Park wrote:
> >> Hello,
> >> 
> >> On Fri, 19 Sep 2025 07:12:44 +0000 Wei Yang <richard.weiyang@gmail.com> wrote:
> >> 
> >> > Current code is not correct to get struct khugepaged_mm_slot by
> >> > mm_slot_entry() without checking mm_slot is !NULL. There is no problem
> >> > reported since slot is the first element of struct khugepaged_mm_slot.
> >> > 
> >> > While struct khugepaged_mm_slot is just a wrapper of struct mm_slot,
> >> > there is no need to define it.
> >> > 
> >> > Remove the definition of struct khugepaged_mm_slot, so there is not
> >> > chance to miss use mm_slot_entry().
> >> > 
> >> > Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
> >> > Cc: Lance Yang <lance.yang@linux.dev>
> >> > Cc: David Hildenbrand <david@redhat.com>
> >> > Cc: Dev Jain <dev.jain@arm.com>
> >> > Cc: Kiryl Shutsemau <kirill@shutemov.name>
> >> > Cc: xu.xin16@zte.com.cn
> >> > ---
> >> >   mm/khugepaged.c | 57 ++++++++++++++++++-------------------------------
> >> >   1 file changed, 21 insertions(+), 36 deletions(-)
> >> > 
> >> > diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> >> > index e019ea2cbab0..88ea92c64bf0 100644
> >> > --- a/mm/khugepaged.c
> >> > +++ b/mm/khugepaged.c
> >> [...]
> >> > @@ -2376,7 +2365,6 @@ static unsigned int khugepaged_scan_mm_slot(unsigned int pages, int *result,
> >> >   	__acquires(&khugepaged_mm_lock)
> >> >   {
> >> >   	struct vma_iterator vmi;
> >> > -	struct khugepaged_mm_slot *mm_slot;
> >> >   	struct mm_slot *slot;
> >> >   	struct mm_struct *mm;
> >> >   	struct vm_area_struct *vma;
> >> > @@ -2387,14 +2375,12 @@ static unsigned int khugepaged_scan_mm_slot(unsigned int pages, int *result,
> >> >   	*result = SCAN_FAIL;
> >> >   	if (khugepaged_scan.mm_slot) {
> >> > -		mm_slot = khugepaged_scan.mm_slot;
> >> > -		slot = &mm_slot->slot;
> >> > +		slot = khugepaged_scan.mm_slot;
> >> >   	} else {
> >> >   		slot = list_first_entry(&khugepaged_scan.mm_head,
> >> >   				     struct mm_slot, mm_node);
> >> > -		mm_slot = mm_slot_entry(slot, struct khugepaged_mm_slot, slot);
> >> >   		khugepaged_scan.address = 0;
> >> > -		khugepaged_scan.mm_slot = mm_slot;
> >> > +		khugepaged_scan.mm_slot = slot;
> >> >   	}
> >> >   	spin_unlock(&khugepaged_mm_lock);
> >> > @@ -2492,7 +2478,7 @@ static unsigned int khugepaged_scan_mm_slot(unsigned int pages, int *result,
> >> >   breakouterloop_mmap_lock:
> >> >   	spin_lock(&khugepaged_mm_lock);
> >> > -	VM_BUG_ON(khugepaged_scan.mm_slot != mm_slot);
> >> > +	VM_BUG_ON(khugepaged_scan.mm_slot != slot);
> >> >   	/*
> >> >   	 * Release the current mm_slot if this mm is about to die, or
> >> >   	 * if we scanned all vmas of this mm.
> >> > @@ -2505,15 +2491,14 @@ static unsigned int khugepaged_scan_mm_slot(unsigned int pages, int *result,
> >> >   		 */
> >> >   		if (!list_is_last(&slot->mm_node, &khugepaged_scan.mm_head)) {
> >> >   			slot = list_next_entry(slot, mm_node);
> >
> >In the original code, we used two distinct local variables.
> >
> >1) struct khugepaged_mm_slot *mm_slot:
> >mm_slot consistently pointed to the item being processed in the
> >current call.
> >
> >2) struct mm_slot *slot:
> >The local slot pointer could be advanced to the next item.
> >
> >> > -			khugepaged_scan.mm_slot =
> >> > -				mm_slot_entry(slot, struct khugepaged_mm_slot, slot);
> >> > +			khugepaged_scan.mm_slot = slot;
> >> >   			khugepaged_scan.address = 0;
> >> >   		} else {
> >> >   			khugepaged_scan.mm_slot = NULL;
> >> >   			khugepaged_full_scans++;
> >> >   		}
> >> > -		collect_mm_slot(mm_slot);
> >
> >At the end, collect_mm_slot(mm_slot) correctly operated on the
> >original item for that scan.
> >
> >> > +		collect_mm_slot(slot);
> >
> >However, this patch merges these two into a single slot variable.
> >
> >When slot = list_next_entry(slot, mm_node); is called, the slot
> >pointer is updated to the next item.
> >
> 
> Oops, you are right. Thanks for spotting it.
> 
> @SeongJae, would you mind applying this change and try again?
> 
> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> index d28d1116e83f..fb517b5ad277 100644
> --- a/mm/khugepaged.c
> +++ b/mm/khugepaged.c
> @@ -2508,8 +2508,7 @@ static unsigned int khugepaged_scan_mm_slot(unsigned int pages, int *result,
>                  * mm_slot not pointing to the exiting mm.
>                  */
>                 if (!list_is_last(&slot->mm_node, &khugepaged_scan.mm_head)) {
> -                       slot = list_next_entry(slot, mm_node);
> -                       khugepaged_scan.mm_slot = slot;
> +                       khugepaged_scan.mm_slot = list_next_entry(slot, mm_node);
>                         khugepaged_scan.address = 0;
>                 } else {
>                         khugepaged_scan.mm_slot = NULL;

Thank you for the investigation and the fix, Lance and Wei!  I just found the
above change makes my repro quiet.  I didn't have time to read the
investigation and your fix thoroughly, and my repro is not stable, though.


Thanks,
SJ

[...]


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Patch v2 2/2] mm/khugepaged: remove definition of struct khugepaged_mm_slot
  2025-09-19  7:36   ` David Hildenbrand
@ 2025-09-22 13:17     ` Nico Pache
  0 siblings, 0 replies; 16+ messages in thread
From: Nico Pache @ 2025-09-22 13:17 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Wei Yang, akpm, lorenzo.stoakes, ziy, baolin.wang, Liam.Howlett,
	ryan.roberts, dev.jain, baohua, lance.yang, xu.xin16, linux-mm,
	Kiryl Shutsemau

On Fri, Sep 19, 2025 at 1:37 AM David Hildenbrand <david@redhat.com> wrote:
>
> On 19.09.25 09:12, Wei Yang wrote:
> > Current code is not correct to get struct khugepaged_mm_slot by
> > mm_slot_entry() without checking mm_slot is !NULL. There is no problem
> > reported since slot is the first element of struct khugepaged_mm_slot.
> >
> > While struct khugepaged_mm_slot is just a wrapper of struct mm_slot,
> > there is no need to define it.
> >
> > Remove the definition of struct khugepaged_mm_slot, so there is not
> > chance to miss use mm_slot_entry().
> >
> > Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
> > Cc: Lance Yang <lance.yang@linux.dev>
> > Cc: David Hildenbrand <david@redhat.com>
> > Cc: Dev Jain <dev.jain@arm.com>
> > Cc: Kiryl Shutsemau <kirill@shutemov.name>
> > Cc: xu.xin16@zte.com.cn
> > ---
> >   mm/khugepaged.c | 57 ++++++++++++++++++-------------------------------
> >   1 file changed, 21 insertions(+), 36 deletions(-)
> >
> > diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> > index e019ea2cbab0..88ea92c64bf0 100644
> > --- a/mm/khugepaged.c
> > +++ b/mm/khugepaged.c
> > @@ -103,14 +103,6 @@ struct collapse_control {
> >       nodemask_t alloc_nmask;
> >   };
> >
> > -/**
> > - * struct khugepaged_mm_slot - khugepaged information per mm that is being scanned
> > - * @slot: hash lookup from mm to mm_slot
> > - */
> > -struct khugepaged_mm_slot {
> > -     struct mm_slot slot;
> > -};
> > -
>
> Looking into the details, we remove the last entries from this member in
> d50791c2bee9 ("mm/khugepaged: delete
> khugepaged_collapse_pte_mapped_thps()").
>
> @Nico did you have any use case in one of your scanning-optimizing
> prototypes for khugepaged_mm_slot?

Nope! Im utilizing the scan control struct. This change looks good
with the fix in place

>
> --
> Cheers
>
> David / dhildenb
>



^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2025-09-22 13:17 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-09-19  7:12 [Patch v2 0/2] mm_slot: fix the usage of mm_slot_entry Wei Yang
2025-09-19  7:12 ` [Patch v2 1/2] mm/ksm: get mm_slot by mm_slot_entry() when slot is !NULL Wei Yang
2025-09-19  7:24   ` David Hildenbrand
2025-09-19  7:38   ` Dev Jain
2025-09-19  7:44   ` Lance Yang
2025-09-19  7:12 ` [Patch v2 2/2] mm/khugepaged: remove definition of struct khugepaged_mm_slot Wei Yang
2025-09-19  7:36   ` David Hildenbrand
2025-09-22 13:17     ` Nico Pache
2025-09-20 11:52   ` SeongJae Park
2025-09-20 12:29     ` Wei Yang
2025-09-20 13:41       ` SeongJae Park
2025-09-21 15:08         ` Wei Yang
2025-09-22  9:33           ` SeongJae Park
2025-09-21 16:07     ` Lance Yang
2025-09-22  0:28       ` Wei Yang
2025-09-22  9:37         ` SeongJae Park

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox