linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/2] mm: memory_failure: unmap poisoned filio during migrate properly
@ 2025-01-13  8:27 Wupeng Ma
  2025-01-13  8:27 ` [PATCH 1/2] mm: memory_hotplug: add TTU_HWPOISON for poisoned folio during migrate Wupeng Ma
  2025-01-13  8:27 ` [PATCH 2/2] hwpoison, memory_hotplug: lock folio before unmap hwpoisoned folio Wupeng Ma
  0 siblings, 2 replies; 10+ messages in thread
From: Wupeng Ma @ 2025-01-13  8:27 UTC (permalink / raw)
  To: akpm, david, osalvador, nao.horiguchi, linmiaohe, mhocko
  Cc: mawupeng1, linux-mm, linux-kernel

From: Ma Wupeng <mawupeng1@huawei.com>

Fix two bugs during migrate folio if folio is poisoned.

Ma Wupeng (2):
  mm: memory_hotplug: add TTU_HWPOISON for poisoned folio during migrate
  hwpoison, memory_hotplug: lock folio before unmap hwpoisoned folio

 mm/memory_hotplug.c | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

-- 
2.25.1



^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH 1/2] mm: memory_hotplug: add TTU_HWPOISON for poisoned folio during migrate
  2025-01-13  8:27 [PATCH 0/2] mm: memory_failure: unmap poisoned filio during migrate properly Wupeng Ma
@ 2025-01-13  8:27 ` Wupeng Ma
  2025-01-13 13:02   ` David Hildenbrand
  2025-01-13  8:27 ` [PATCH 2/2] hwpoison, memory_hotplug: lock folio before unmap hwpoisoned folio Wupeng Ma
  1 sibling, 1 reply; 10+ messages in thread
From: Wupeng Ma @ 2025-01-13  8:27 UTC (permalink / raw)
  To: akpm, david, osalvador, nao.horiguchi, linmiaohe, mhocko
  Cc: mawupeng1, linux-mm, linux-kernel

From: Ma Wupeng <mawupeng1@huawei.com>

Commit 6da6b1d4a7df ("mm/hwpoison: convert TTU_IGNORE_HWPOISON to
TTU_HWPOISON") introduce TTU_HWPOISON to replace TTU_IGNORE_HWPOISON
in order to stop send SIGBUS signal when accessing an error page after
a memory error on a clean folio. However during page migration, task
should be killed during migrate in order to make this operation succeed.

Waring will be produced during unamp poison folio with the following log:

  ------------[ cut here ]------------
  WARNING: CPU: 1 PID: 365 at mm/rmap.c:1847 try_to_unmap_one+0x8fc/0xd3c
  Modules linked in:
  CPU: 1 UID: 0 PID: 365 Comm: bash Tainted: G        W          6.13.0-rc1-00018-gacdb4bbda7ab #42
  Tainted: [W]=WARN
  Hardware name: QEMU QEMU Virtual Machine, BIOS 0.0.0 02/06/2015
  pstate: 20400005 (nzCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
  pc : try_to_unmap_one+0x8fc/0xd3c
  lr : try_to_unmap_one+0x3dc/0xd3c
  Call trace:
   try_to_unmap_one+0x8fc/0xd3c (P)
   try_to_unmap_one+0x3dc/0xd3c (L)
   rmap_walk_anon+0xdc/0x1f8
   rmap_walk+0x3c/0x58
   try_to_unmap+0x88/0x90
   unmap_poisoned_folio+0x30/0xa8
   do_migrate_range+0x4a0/0x568
   offline_pages+0x5a4/0x670
   memory_block_action+0x17c/0x374
   memory_subsys_offline+0x3c/0x78
   device_offline+0xa4/0xd0
   state_store+0x8c/0xf0
   dev_attr_store+0x18/0x2c
   sysfs_kf_write+0x44/0x54
   kernfs_fop_write_iter+0x118/0x1a8
   vfs_write+0x3a8/0x4bc
   ksys_write+0x6c/0xf8
   __arm64_sys_write+0x1c/0x28
   invoke_syscall+0x44/0x100
   el0_svc_common.constprop.0+0x40/0xe0
   do_el0_svc+0x1c/0x28
   el0_svc+0x30/0xd0
   el0t_64_sync_handler+0xc8/0xcc
   el0t_64_sync+0x198/0x19c
  ---[ end trace 0000000000000000 ]---

Add TTU_HWPOISON during unmap_poisoned_folio to fix this problem.

Fixes: 6da6b1d4a7df ("mm/hwpoison: convert TTU_IGNORE_HWPOISON to TTU_HWPOISON")
Signed-off-by: Ma Wupeng <mawupeng1@huawei.com>
---
 mm/memory_hotplug.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index c43b4e7fb298..330668d37e44 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1806,7 +1806,7 @@ static void do_migrate_range(unsigned long start_pfn, unsigned long end_pfn)
 			if (WARN_ON(folio_test_lru(folio)))
 				folio_isolate_lru(folio);
 			if (folio_mapped(folio))
-				unmap_poisoned_folio(folio, TTU_IGNORE_MLOCK);
+				unmap_poisoned_folio(folio, TTU_IGNORE_MLOCK | TTU_HWPOISON);
 			continue;
 		}
 
-- 
2.25.1



^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH 2/2] hwpoison, memory_hotplug: lock folio before unmap hwpoisoned folio
  2025-01-13  8:27 [PATCH 0/2] mm: memory_failure: unmap poisoned filio during migrate properly Wupeng Ma
  2025-01-13  8:27 ` [PATCH 1/2] mm: memory_hotplug: add TTU_HWPOISON for poisoned folio during migrate Wupeng Ma
@ 2025-01-13  8:27 ` Wupeng Ma
  2025-01-13 12:35   ` David Hildenbrand
  1 sibling, 1 reply; 10+ messages in thread
From: Wupeng Ma @ 2025-01-13  8:27 UTC (permalink / raw)
  To: akpm, david, osalvador, nao.horiguchi, linmiaohe, mhocko
  Cc: mawupeng1, linux-mm, linux-kernel

From: Ma Wupeng <mawupeng1@huawei.com>

Commit b15c87263a69 ("hwpoison, memory_hotplug: allow hwpoisoned pages to
be offlined) add page poison checks in do_migrate_range in order to make
offline hwpoisoned page possible by introducing isolate_lru_page and
try_to_unmap for hwpoisoned page. However folio lock must be held before
calling try_to_unmap. Add it to fix this problem.

Waring will be produced if filio is not locked during unmap:

  ------------[ cut here ]------------
  kernel BUG at ./include/linux/swapops.h:400!
  Internal error: Oops - BUG: 00000000f2000800 [#1] PREEMPT SMP
  Modules linked in:
  CPU: 4 UID: 0 PID: 411 Comm: bash Tainted: G        W          6.13.0-rc1-00016-g3c434c7ee82a-dirty #41
  Tainted: [W]=WARN
  Hardware name: QEMU QEMU Virtual Machine, BIOS 0.0.0 02/06/2015
  pstate: 40400005 (nZcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
  pc : try_to_unmap_one+0xb08/0xd3c
  lr : try_to_unmap_one+0x3dc/0xd3c
  Call trace:
   try_to_unmap_one+0xb08/0xd3c (P)
   try_to_unmap_one+0x3dc/0xd3c (L)
   rmap_walk_anon+0xdc/0x1f8
   rmap_walk+0x3c/0x58
   try_to_unmap+0x88/0x90
   unmap_poisoned_folio+0x30/0xa8
   do_migrate_range+0x4a0/0x568
   offline_pages+0x5a4/0x670
   memory_block_action+0x17c/0x374
   memory_subsys_offline+0x3c/0x78
   device_offline+0xa4/0xd0
   state_store+0x8c/0xf0
   dev_attr_store+0x18/0x2c
   sysfs_kf_write+0x44/0x54
   kernfs_fop_write_iter+0x118/0x1a8
   vfs_write+0x3a8/0x4bc
   ksys_write+0x6c/0xf8
   __arm64_sys_write+0x1c/0x28
   invoke_syscall+0x44/0x100
   el0_svc_common.constprop.0+0x40/0xe0
   do_el0_svc+0x1c/0x28
   el0_svc+0x30/0xd0
   el0t_64_sync_handler+0xc8/0xcc
   el0t_64_sync+0x198/0x19c
  Code: f9407be0 b5fff320 d4210000 17ffff97 (d4210000)
  ---[ end trace 0000000000000000 ]---

Fixes: b15c87263a69 ("hwpoison, memory_hotplug: allow hwpoisoned pages to be offlined")
Signed-off-by: Ma Wupeng <mawupeng1@huawei.com>
---
 mm/memory_hotplug.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 330668d37e44..9bedecfc3577 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1805,8 +1805,12 @@ static void do_migrate_range(unsigned long start_pfn, unsigned long end_pfn)
 		    (folio_test_large(folio) && folio_test_has_hwpoisoned(folio))) {
 			if (WARN_ON(folio_test_lru(folio)))
 				folio_isolate_lru(folio);
-			if (folio_mapped(folio))
+			if (folio_mapped(folio)) {
+				folio_lock(folio);
 				unmap_poisoned_folio(folio, TTU_IGNORE_MLOCK | TTU_HWPOISON);
+				folio_unlock(folio);
+			}
+
 			continue;
 		}
 
-- 
2.25.1



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 2/2] hwpoison, memory_hotplug: lock folio before unmap hwpoisoned folio
  2025-01-13  8:27 ` [PATCH 2/2] hwpoison, memory_hotplug: lock folio before unmap hwpoisoned folio Wupeng Ma
@ 2025-01-13 12:35   ` David Hildenbrand
  2025-01-14  9:08     ` mawupeng
  2025-01-15  2:03     ` Miaohe Lin
  0 siblings, 2 replies; 10+ messages in thread
From: David Hildenbrand @ 2025-01-13 12:35 UTC (permalink / raw)
  To: Wupeng Ma, akpm, osalvador, nao.horiguchi, linmiaohe, mhocko
  Cc: linux-mm, linux-kernel

On 13.01.25 09:27, Wupeng Ma wrote:
> From: Ma Wupeng <mawupeng1@huawei.com>
> 
> Commit b15c87263a69 ("hwpoison, memory_hotplug: allow hwpoisoned pages to
> be offlined) add page poison checks in do_migrate_range in order to make
> offline hwpoisoned page possible by introducing isolate_lru_page and
> try_to_unmap for hwpoisoned page. However folio lock must be held before
> calling try_to_unmap. Add it to fix this problem.
> 
> Waring will be produced if filio is not locked during unmap:
> 
>    ------------[ cut here ]------------
>    kernel BUG at ./include/linux/swapops.h:400!
>    Internal error: Oops - BUG: 00000000f2000800 [#1] PREEMPT SMP
>    Modules linked in:
>    CPU: 4 UID: 0 PID: 411 Comm: bash Tainted: G        W          6.13.0-rc1-00016-g3c434c7ee82a-dirty #41
>    Tainted: [W]=WARN
>    Hardware name: QEMU QEMU Virtual Machine, BIOS 0.0.0 02/06/2015
>    pstate: 40400005 (nZcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
>    pc : try_to_unmap_one+0xb08/0xd3c
>    lr : try_to_unmap_one+0x3dc/0xd3c
>    Call trace:
>     try_to_unmap_one+0xb08/0xd3c (P)
>     try_to_unmap_one+0x3dc/0xd3c (L)
>     rmap_walk_anon+0xdc/0x1f8
>     rmap_walk+0x3c/0x58
>     try_to_unmap+0x88/0x90
>     unmap_poisoned_folio+0x30/0xa8
>     do_migrate_range+0x4a0/0x568
>     offline_pages+0x5a4/0x670
>     memory_block_action+0x17c/0x374
>     memory_subsys_offline+0x3c/0x78
>     device_offline+0xa4/0xd0
>     state_store+0x8c/0xf0
>     dev_attr_store+0x18/0x2c
>     sysfs_kf_write+0x44/0x54
>     kernfs_fop_write_iter+0x118/0x1a8
>     vfs_write+0x3a8/0x4bc
>     ksys_write+0x6c/0xf8
>     __arm64_sys_write+0x1c/0x28
>     invoke_syscall+0x44/0x100
>     el0_svc_common.constprop.0+0x40/0xe0
>     do_el0_svc+0x1c/0x28
>     el0_svc+0x30/0xd0
>     el0t_64_sync_handler+0xc8/0xcc
>     el0t_64_sync+0x198/0x19c
>    Code: f9407be0 b5fff320 d4210000 17ffff97 (d4210000)
>    ---[ end trace 0000000000000000 ]---
> 
> Fixes: b15c87263a69 ("hwpoison, memory_hotplug: allow hwpoisoned pages to be offlined")
> Signed-off-by: Ma Wupeng <mawupeng1@huawei.com>
> ---
>   mm/memory_hotplug.c | 6 +++++-
>   1 file changed, 5 insertions(+), 1 deletion(-)
> 
> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> index 330668d37e44..9bedecfc3577 100644
> --- a/mm/memory_hotplug.c
> +++ b/mm/memory_hotplug.c
> @@ -1805,8 +1805,12 @@ static void do_migrate_range(unsigned long start_pfn, unsigned long end_pfn)
>   		    (folio_test_large(folio) && folio_test_has_hwpoisoned(folio))) {
>   			if (WARN_ON(folio_test_lru(folio)))
>   				folio_isolate_lru(folio);
> -			if (folio_mapped(folio))
> +			if (folio_mapped(folio)) {
> +				folio_lock(folio);
>   				unmap_poisoned_folio(folio, TTU_IGNORE_MLOCK | TTU_HWPOISON);
> +				folio_unlock(folio);
> +			}

The comment above says "have elevated reference counts", but I I wonder 
if this code could race with un-poisoning (although probably a rare event).


If there is an elevated reference already, why not move that chunk after 
the folio_try_get() and just drop the comment that describes the implied 
magic?

I mean, migration of hwpoison is dangerous either way, so that's not the 
biggest problem I guess :)

-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/2] mm: memory_hotplug: add TTU_HWPOISON for poisoned folio during migrate
  2025-01-13  8:27 ` [PATCH 1/2] mm: memory_hotplug: add TTU_HWPOISON for poisoned folio during migrate Wupeng Ma
@ 2025-01-13 13:02   ` David Hildenbrand
  2025-01-14  3:37     ` mawupeng
  0 siblings, 1 reply; 10+ messages in thread
From: David Hildenbrand @ 2025-01-13 13:02 UTC (permalink / raw)
  To: Wupeng Ma, akpm, osalvador, nao.horiguchi, linmiaohe, mhocko
  Cc: linux-mm, linux-kernel

On 13.01.25 09:27, Wupeng Ma wrote:
> From: Ma Wupeng <mawupeng1@huawei.com>
> 
> Commit 6da6b1d4a7df ("mm/hwpoison: convert TTU_IGNORE_HWPOISON to
> TTU_HWPOISON") introduce TTU_HWPOISON to replace TTU_IGNORE_HWPOISON
> in order to stop send SIGBUS signal when accessing an error page after
> a memory error on a clean folio. However during page migration, task
> should be killed during migrate in order to make this operation succeed.
> 
> Waring will be produced during unamp poison folio with the following log:
> 
>    ------------[ cut here ]------------
>    WARNING: CPU: 1 PID: 365 at mm/rmap.c:1847 try_to_unmap_one+0x8fc/0xd3c

Is that the

if (unlikely(folio_test_swapbacked(folio) !=
		folio_test_swapcache(folio))) {
	WARN_ON_ONCE(1);
	goto walk_abort;
}

?

>    Modules linked in:
>    CPU: 1 UID: 0 PID: 365 Comm: bash Tainted: G        W          6.13.0-rc1-00018-gacdb4bbda7ab #42
>    Tainted: [W]=WARN
>    Hardware name: QEMU QEMU Virtual Machine, BIOS 0.0.0 02/06/2015
>    pstate: 20400005 (nzCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
>    pc : try_to_unmap_one+0x8fc/0xd3c
>    lr : try_to_unmap_one+0x3dc/0xd3c
>    Call trace:
>     try_to_unmap_one+0x8fc/0xd3c (P)
>     try_to_unmap_one+0x3dc/0xd3c (L)
>     rmap_walk_anon+0xdc/0x1f8
>     rmap_walk+0x3c/0x58
>     try_to_unmap+0x88/0x90
>     unmap_poisoned_folio+0x30/0xa8
>     do_migrate_range+0x4a0/0x568
>     offline_pages+0x5a4/0x670
>     memory_block_action+0x17c/0x374
>     memory_subsys_offline+0x3c/0x78
>     device_offline+0xa4/0xd0
>     state_store+0x8c/0xf0
>     dev_attr_store+0x18/0x2c
>     sysfs_kf_write+0x44/0x54
>     kernfs_fop_write_iter+0x118/0x1a8
>     vfs_write+0x3a8/0x4bc
>     ksys_write+0x6c/0xf8
>     __arm64_sys_write+0x1c/0x28
>     invoke_syscall+0x44/0x100
>     el0_svc_common.constprop.0+0x40/0xe0
>     do_el0_svc+0x1c/0x28
>     el0_svc+0x30/0xd0
>     el0t_64_sync_handler+0xc8/0xcc
>     el0t_64_sync+0x198/0x19c
>    ---[ end trace 0000000000000000 ]---
> 
> Add TTU_HWPOISON during unmap_poisoned_folio to fix this problem.
> 
> Fixes: 6da6b1d4a7df ("mm/hwpoison: convert TTU_IGNORE_HWPOISON to TTU_HWPOISON")
> Signed-off-by: Ma Wupeng <mawupeng1@huawei.com>
> ---
>   mm/memory_hotplug.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> index c43b4e7fb298..330668d37e44 100644
> --- a/mm/memory_hotplug.c
> +++ b/mm/memory_hotplug.c
> @@ -1806,7 +1806,7 @@ static void do_migrate_range(unsigned long start_pfn, unsigned long end_pfn)
>   			if (WARN_ON(folio_test_lru(folio)))
>   				folio_isolate_lru(folio);
>   			if (folio_mapped(folio))
> -				unmap_poisoned_folio(folio, TTU_IGNORE_MLOCK);
> +				unmap_poisoned_folio(folio, TTU_IGNORE_MLOCK | TTU_HWPOISON);


Correct me if I am wrong: if we reach that point, we already failed the
unmap_poisoned_folio() in hwpoison_user_mappings(), and did a

pr_err("%#lx: failed to unmap page (folio mapcount=%d)\n", ...

there.


This all looks odd. We must not call unmap_*() on an anon folio without
setting TTU_HWPOISON ever if they are poisoned. But for the pagecache we
just might want to?


What about detecting the flags internally and just moving the detection logic into
unmap_poisoned_folio()?


diff --git a/mm/internal.h b/mm/internal.h
index 109ef30fee11f..eb8266d754b71 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -1115,7 +1115,7 @@ static inline int find_next_best_node(int node, nodemask_t *used_node_mask)
   * mm/memory-failure.c
   */
  #ifdef CONFIG_MEMORY_FAILURE
-void unmap_poisoned_folio(struct folio *folio, enum ttu_flags ttu);
+int unmap_poisoned_folio(struct folio *folio, bool must_kill);
  void shake_folio(struct folio *folio);
  extern int hwpoison_filter(struct page *p);
  
@@ -1138,7 +1138,7 @@ unsigned long page_mapped_in_vma(const struct page *page,
  		struct vm_area_struct *vma);
  
  #else
-static inline void unmap_poisoned_folio(struct folio *folio, enum ttu_flags ttu)
+static inline int unmap_poisoned_folio(struct folio *folio, bool must_kill)
  {
  }
  #endif
diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index a7b8ccd29b6f5..2054b63e9b79c 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -1556,8 +1556,29 @@ static int get_hwpoison_page(struct page *p, unsigned long flags)
  	return ret;
  }
  
-void unmap_poisoned_folio(struct folio *folio, enum ttu_flags ttu)
+int unmap_poisoned_folio(struct folio *folio, bool must_kill)
  {
+	enum ttu_flags ttu = TTU_IGNORE_MLOCK | TTU_SYNC | TTU_HWPOISON;
+	struct address_space *mapping;
+
+	/*
+	 * Propagate the dirty bit from PTEs to struct page first, because we
+	 * need this to decide if we should kill or just drop the page.
+	 * XXX: the dirty test could be racy: set_page_dirty() may not always
+	 * be called inside page lock (it's recommended but not enforced).
+	 */
+	mapping = folio_mapping(folio);
+	if (!must_kill && !folio_test_dirty(folio) && mapping &&
+	    mapping_can_writeback(mapping)) {
+		if (folio_mkclean(folio)) {
+			folio_set_dirty(folio);
+		} else {
+			ttu &= ~TTU_HWPOISON;
+			pr_info("%#lx: corrupted page was clean: dropped without side effects\n",
+				pfn);
+		}
+	}
+
  	if (folio_test_hugetlb(folio) && !folio_test_anon(folio)) {
  		struct address_space *mapping;
  
@@ -1580,6 +1601,8 @@ void unmap_poisoned_folio(struct folio *folio, enum ttu_flags ttu)
  	} else {
  		try_to_unmap(folio, ttu);
  	}
+
+	return folio_mapped(folio) ? -EBUSY : 0;
  }
  
  /*
@@ -1589,8 +1612,6 @@ void unmap_poisoned_folio(struct folio *folio, enum ttu_flags ttu)
  static bool hwpoison_user_mappings(struct folio *folio, struct page *p,
  		unsigned long pfn, int flags)
  {
-	enum ttu_flags ttu = TTU_IGNORE_MLOCK | TTU_SYNC | TTU_HWPOISON;
-	struct address_space *mapping;
  	LIST_HEAD(tokill);
  	bool unmap_success;
  	int forcekill;
@@ -1618,24 +1639,6 @@ static bool hwpoison_user_mappings(struct folio *folio, struct page *p,
  		ttu &= ~TTU_HWPOISON;
  	}
  
-	/*
-	 * Propagate the dirty bit from PTEs to struct page first, because we
-	 * need this to decide if we should kill or just drop the page.
-	 * XXX: the dirty test could be racy: set_page_dirty() may not always
-	 * be called inside page lock (it's recommended but not enforced).
-	 */
-	mapping = folio_mapping(folio);
-	if (!(flags & MF_MUST_KILL) && !folio_test_dirty(folio) && mapping &&
-	    mapping_can_writeback(mapping)) {
-		if (folio_mkclean(folio)) {
-			folio_set_dirty(folio);
-		} else {
-			ttu &= ~TTU_HWPOISON;
-			pr_info("%#lx: corrupted page was clean: dropped without side effects\n",
-				pfn);
-		}
-	}
-
  	/*
  	 * First collect all the processes that have the page
  	 * mapped in dirty form.  This has to be done before try_to_unmap,
@@ -1643,9 +1646,7 @@ static bool hwpoison_user_mappings(struct folio *folio, struct page *p,
  	 */
  	collect_procs(folio, p, &tokill, flags & MF_ACTION_REQUIRED);
  
-	unmap_poisoned_folio(folio, ttu);
-
-	unmap_success = !folio_mapped(folio);
+	unmap_success = !unmap_poisoned_folio(folio, flags & MF_MUST_KILL);
  	if (!unmap_success)
  		pr_err("%#lx: failed to unmap page (folio mapcount=%d)\n",
  		       pfn, folio_mapcount(folio));
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index e3655f07dd6e3..c549e68102251 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1833,7 +1833,7 @@ static void do_migrate_range(unsigned long start_pfn, unsigned long end_pfn)
  			if (WARN_ON(folio_test_lru(folio)))
  				folio_isolate_lru(folio);
  			if (folio_mapped(folio))
-				unmap_poisoned_folio(folio, TTU_IGNORE_MLOCK);
+				unmap_poisoned_folio(folio, false);
  			continue;
  		}
  
-- 
2.47.1



-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/2] mm: memory_hotplug: add TTU_HWPOISON for poisoned folio during migrate
  2025-01-13 13:02   ` David Hildenbrand
@ 2025-01-14  3:37     ` mawupeng
  2025-01-14  9:35       ` David Hildenbrand
  0 siblings, 1 reply; 10+ messages in thread
From: mawupeng @ 2025-01-14  3:37 UTC (permalink / raw)
  To: david, akpm, osalvador, nao.horiguchi, linmiaohe, mhocko
  Cc: mawupeng1, linux-mm, linux-kernel



On 2025/1/13 21:02, David Hildenbrand wrote:
> On 13.01.25 09:27, Wupeng Ma wrote:
>> From: Ma Wupeng <mawupeng1@huawei.com>
>>
>> Commit 6da6b1d4a7df ("mm/hwpoison: convert TTU_IGNORE_HWPOISON to
>> TTU_HWPOISON") introduce TTU_HWPOISON to replace TTU_IGNORE_HWPOISON
>> in order to stop send SIGBUS signal when accessing an error page after
>> a memory error on a clean folio. However during page migration, task
>> should be killed during migrate in order to make this operation succeed.
>>
>> Waring will be produced during unamp poison folio with the following log:
>>
>>    ------------[ cut here ]------------
>>    WARNING: CPU: 1 PID: 365 at mm/rmap.c:1847 try_to_unmap_one+0x8fc/0xd3c
> 
> Is that the
> 
> if (unlikely(folio_test_swapbacked(folio) !=
>         folio_test_swapcache(folio))) {
>     WARN_ON_ONCE(1);
>     goto walk_abort;
> }
> 
> ?

Yes.

> 
>>    Modules linked in:
>>    CPU: 1 UID: 0 PID: 365 Comm: bash Tainted: G        W          6.13.0-rc1-00018-gacdb4bbda7ab #42
>>    Tainted: [W]=WARN
>>    Hardware name: QEMU QEMU Virtual Machine, BIOS 0.0.0 02/06/2015
>>    pstate: 20400005 (nzCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
>>    pc : try_to_unmap_one+0x8fc/0xd3c
>>    lr : try_to_unmap_one+0x3dc/0xd3c
>>    Call trace:
>>     try_to_unmap_one+0x8fc/0xd3c (P)
>>     try_to_unmap_one+0x3dc/0xd3c (L)
>>     rmap_walk_anon+0xdc/0x1f8
>>     rmap_walk+0x3c/0x58
>>     try_to_unmap+0x88/0x90
>>     unmap_poisoned_folio+0x30/0xa8
>>     do_migrate_range+0x4a0/0x568
>>     offline_pages+0x5a4/0x670
>>     memory_block_action+0x17c/0x374
>>     memory_subsys_offline+0x3c/0x78
>>     device_offline+0xa4/0xd0
>>     state_store+0x8c/0xf0
>>     dev_attr_store+0x18/0x2c
>>     sysfs_kf_write+0x44/0x54
>>     kernfs_fop_write_iter+0x118/0x1a8
>>     vfs_write+0x3a8/0x4bc
>>     ksys_write+0x6c/0xf8
>>     __arm64_sys_write+0x1c/0x28
>>     invoke_syscall+0x44/0x100
>>     el0_svc_common.constprop.0+0x40/0xe0
>>     do_el0_svc+0x1c/0x28
>>     el0_svc+0x30/0xd0
>>     el0t_64_sync_handler+0xc8/0xcc
>>     el0t_64_sync+0x198/0x19c
>>    ---[ end trace 0000000000000000 ]---
>>
>> Add TTU_HWPOISON during unmap_poisoned_folio to fix this problem.
>>
>> Fixes: 6da6b1d4a7df ("mm/hwpoison: convert TTU_IGNORE_HWPOISON to TTU_HWPOISON")
>> Signed-off-by: Ma Wupeng <mawupeng1@huawei.com>
>> ---
>>   mm/memory_hotplug.c | 2 +-
>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
>> index c43b4e7fb298..330668d37e44 100644
>> --- a/mm/memory_hotplug.c
>> +++ b/mm/memory_hotplug.c
>> @@ -1806,7 +1806,7 @@ static void do_migrate_range(unsigned long start_pfn, unsigned long end_pfn)
>>               if (WARN_ON(folio_test_lru(folio)))
>>                   folio_isolate_lru(folio);
>>               if (folio_mapped(folio))
>> -                unmap_poisoned_folio(folio, TTU_IGNORE_MLOCK);
>> +                unmap_poisoned_folio(folio, TTU_IGNORE_MLOCK | TTU_HWPOISON);
> 
> 
> Correct me if I am wrong: if we reach that point, we already failed the
> unmap_poisoned_folio() in hwpoison_user_mappings(), and did a
> 
> pr_err("%#lx: failed to unmap page (folio mapcount=%d)\n", ...
> 
> there.

We have faced a different race here as follow:

CPU#0			cpu #1
			offline_pages
			do_migrate_range
memory_failure
TestSetPageHWPoison
			// folio is mark poison here
			unmap_poisoned_folio // should kill task here
...
hwpoison_user_mappings

> 
> 
> This all looks odd. We must not call unmap_*() on an anon folio without
> setting TTU_HWPOISON ever if they are poisoned. But for the pagecache we
> just might want to?
> 
> 
> What about detecting the flags internally and just moving the detection logic into
> unmap_poisoned_folio()?

Your solution do seems better and solved my problem.I'll send another patch based on
your idea.

> 
> 
> diff --git a/mm/internal.h b/mm/internal.h
> index 109ef30fee11f..eb8266d754b71 100644
> --- a/mm/internal.h
> +++ b/mm/internal.h
> @@ -1115,7 +1115,7 @@ static inline int find_next_best_node(int node, nodemask_t *used_node_mask)
>   * mm/memory-failure.c
>   */
>  #ifdef CONFIG_MEMORY_FAILURE
> -void unmap_poisoned_folio(struct folio *folio, enum ttu_flags ttu);
> +int unmap_poisoned_folio(struct folio *folio, bool must_kill);
>  void shake_folio(struct folio *folio);
>  extern int hwpoison_filter(struct page *p);
>  
> @@ -1138,7 +1138,7 @@ unsigned long page_mapped_in_vma(const struct page *page,
>          struct vm_area_struct *vma);
>  
>  #else
> -static inline void unmap_poisoned_folio(struct folio *folio, enum ttu_flags ttu)
> +static inline int unmap_poisoned_folio(struct folio *folio, bool must_kill)
>  {
>  }
>  #endif
> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
> index a7b8ccd29b6f5..2054b63e9b79c 100644
> --- a/mm/memory-failure.c
> +++ b/mm/memory-failure.c
> @@ -1556,8 +1556,29 @@ static int get_hwpoison_page(struct page *p, unsigned long flags)
>      return ret;
>  }
>  
> -void unmap_poisoned_folio(struct folio *folio, enum ttu_flags ttu)
> +int unmap_poisoned_folio(struct folio *folio, bool must_kill)
>  {
> +    enum ttu_flags ttu = TTU_IGNORE_MLOCK | TTU_SYNC | TTU_HWPOISON;
> +    struct address_space *mapping;
> +
> +    /*
> +     * Propagate the dirty bit from PTEs to struct page first, because we
> +     * need this to decide if we should kill or just drop the page.
> +     * XXX: the dirty test could be racy: set_page_dirty() may not always
> +     * be called inside page lock (it's recommended but not enforced).
> +     */
> +    mapping = folio_mapping(folio);
> +    if (!must_kill && !folio_test_dirty(folio) && mapping &&
> +        mapping_can_writeback(mapping)) {
> +        if (folio_mkclean(folio)) {
> +            folio_set_dirty(folio);
> +        } else {
> +            ttu &= ~TTU_HWPOISON;
> +            pr_info("%#lx: corrupted page was clean: dropped without side effects\n",
> +                pfn);
> +        }
> +    }
> +
>      if (folio_test_hugetlb(folio) && !folio_test_anon(folio)) {
>          struct address_space *mapping;
>  
> @@ -1580,6 +1601,8 @@ void unmap_poisoned_folio(struct folio *folio, enum ttu_flags ttu)
>      } else {
>          try_to_unmap(folio, ttu);
>      }
> +
> +    return folio_mapped(folio) ? -EBUSY : 0;
>  }
>  
>  /*
> @@ -1589,8 +1612,6 @@ void unmap_poisoned_folio(struct folio *folio, enum ttu_flags ttu)
>  static bool hwpoison_user_mappings(struct folio *folio, struct page *p,
>          unsigned long pfn, int flags)
>  {
> -    enum ttu_flags ttu = TTU_IGNORE_MLOCK | TTU_SYNC | TTU_HWPOISON;
> -    struct address_space *mapping;
>      LIST_HEAD(tokill);
>      bool unmap_success;
>      int forcekill;
> @@ -1618,24 +1639,6 @@ static bool hwpoison_user_mappings(struct folio *folio, struct page *p,
>          ttu &= ~TTU_HWPOISON;
>      }
>  
> -    /*
> -     * Propagate the dirty bit from PTEs to struct page first, because we
> -     * need this to decide if we should kill or just drop the page.
> -     * XXX: the dirty test could be racy: set_page_dirty() may not always
> -     * be called inside page lock (it's recommended but not enforced).
> -     */
> -    mapping = folio_mapping(folio);
> -    if (!(flags & MF_MUST_KILL) && !folio_test_dirty(folio) && mapping &&
> -        mapping_can_writeback(mapping)) {
> -        if (folio_mkclean(folio)) {
> -            folio_set_dirty(folio);
> -        } else {
> -            ttu &= ~TTU_HWPOISON;
> -            pr_info("%#lx: corrupted page was clean: dropped without side effects\n",
> -                pfn);
> -        }
> -    }
> -
>      /*
>       * First collect all the processes that have the page
>       * mapped in dirty form.  This has to be done before try_to_unmap,
> @@ -1643,9 +1646,7 @@ static bool hwpoison_user_mappings(struct folio *folio, struct page *p,
>       */
>      collect_procs(folio, p, &tokill, flags & MF_ACTION_REQUIRED);
>  
> -    unmap_poisoned_folio(folio, ttu);
> -
> -    unmap_success = !folio_mapped(folio);
> +    unmap_success = !unmap_poisoned_folio(folio, flags & MF_MUST_KILL);
>      if (!unmap_success)
>          pr_err("%#lx: failed to unmap page (folio mapcount=%d)\n",
>                 pfn, folio_mapcount(folio));
> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> index e3655f07dd6e3..c549e68102251 100644
> --- a/mm/memory_hotplug.c
> +++ b/mm/memory_hotplug.c
> @@ -1833,7 +1833,7 @@ static void do_migrate_range(unsigned long start_pfn, unsigned long end_pfn)
>              if (WARN_ON(folio_test_lru(folio)))
>                  folio_isolate_lru(folio);
>              if (folio_mapped(folio))
> -                unmap_poisoned_folio(folio, TTU_IGNORE_MLOCK);
> +                unmap_poisoned_folio(folio, false);
>              continue;
>          }
>  



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 2/2] hwpoison, memory_hotplug: lock folio before unmap hwpoisoned folio
  2025-01-13 12:35   ` David Hildenbrand
@ 2025-01-14  9:08     ` mawupeng
  2025-01-14  9:40       ` David Hildenbrand
  2025-01-15  2:03     ` Miaohe Lin
  1 sibling, 1 reply; 10+ messages in thread
From: mawupeng @ 2025-01-14  9:08 UTC (permalink / raw)
  To: david, akpm, osalvador, nao.horiguchi, linmiaohe, mhocko
  Cc: mawupeng1, linux-mm, linux-kernel



On 2025/1/13 20:35, David Hildenbrand wrote:
> On 13.01.25 09:27, Wupeng Ma wrote:
>> From: Ma Wupeng <mawupeng1@huawei.com>
>>
>> Commit b15c87263a69 ("hwpoison, memory_hotplug: allow hwpoisoned pages to
>> be offlined) add page poison checks in do_migrate_range in order to make
>> offline hwpoisoned page possible by introducing isolate_lru_page and
>> try_to_unmap for hwpoisoned page. However folio lock must be held before
>> calling try_to_unmap. Add it to fix this problem.
>>
>> Waring will be produced if filio is not locked during unmap:
>>
>>    ------------[ cut here ]------------
>>    kernel BUG at ./include/linux/swapops.h:400!
>>    Internal error: Oops - BUG: 00000000f2000800 [#1] PREEMPT SMP
>>    Modules linked in:
>>    CPU: 4 UID: 0 PID: 411 Comm: bash Tainted: G        W          6.13.0-rc1-00016-g3c434c7ee82a-dirty #41
>>    Tainted: [W]=WARN
>>    Hardware name: QEMU QEMU Virtual Machine, BIOS 0.0.0 02/06/2015
>>    pstate: 40400005 (nZcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
>>    pc : try_to_unmap_one+0xb08/0xd3c
>>    lr : try_to_unmap_one+0x3dc/0xd3c
>>    Call trace:
>>     try_to_unmap_one+0xb08/0xd3c (P)
>>     try_to_unmap_one+0x3dc/0xd3c (L)
>>     rmap_walk_anon+0xdc/0x1f8
>>     rmap_walk+0x3c/0x58
>>     try_to_unmap+0x88/0x90
>>     unmap_poisoned_folio+0x30/0xa8
>>     do_migrate_range+0x4a0/0x568
>>     offline_pages+0x5a4/0x670
>>     memory_block_action+0x17c/0x374
>>     memory_subsys_offline+0x3c/0x78
>>     device_offline+0xa4/0xd0
>>     state_store+0x8c/0xf0
>>     dev_attr_store+0x18/0x2c
>>     sysfs_kf_write+0x44/0x54
>>     kernfs_fop_write_iter+0x118/0x1a8
>>     vfs_write+0x3a8/0x4bc
>>     ksys_write+0x6c/0xf8
>>     __arm64_sys_write+0x1c/0x28
>>     invoke_syscall+0x44/0x100
>>     el0_svc_common.constprop.0+0x40/0xe0
>>     do_el0_svc+0x1c/0x28
>>     el0_svc+0x30/0xd0
>>     el0t_64_sync_handler+0xc8/0xcc
>>     el0t_64_sync+0x198/0x19c
>>    Code: f9407be0 b5fff320 d4210000 17ffff97 (d4210000)
>>    ---[ end trace 0000000000000000 ]---
>>
>> Fixes: b15c87263a69 ("hwpoison, memory_hotplug: allow hwpoisoned pages to be offlined")
>> Signed-off-by: Ma Wupeng <mawupeng1@huawei.com>
>> ---
>>   mm/memory_hotplug.c | 6 +++++-
>>   1 file changed, 5 insertions(+), 1 deletion(-)
>>
>> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
>> index 330668d37e44..9bedecfc3577 100644
>> --- a/mm/memory_hotplug.c
>> +++ b/mm/memory_hotplug.c
>> @@ -1805,8 +1805,12 @@ static void do_migrate_range(unsigned long start_pfn, unsigned long end_pfn)
>>               (folio_test_large(folio) && folio_test_has_hwpoisoned(folio))) {
>>               if (WARN_ON(folio_test_lru(folio)))
>>                   folio_isolate_lru(folio);
>> -            if (folio_mapped(folio))
>> +            if (folio_mapped(folio)) {
>> +                folio_lock(folio);
>>                   unmap_poisoned_folio(folio, TTU_IGNORE_MLOCK | TTU_HWPOISON);
>> +                folio_unlock(folio);
>> +            }
> 
> The comment above says "have elevated reference counts", but I I wonder if this code could race with un-poisoning (although probably a rare event).
> 
> 
> If there is an elevated reference already, why not move that chunk after the folio_try_get() and just drop the comment that describes the implied magic?
> 
> I mean, migration of hwpoison is dangerous either way, so that's not the biggest problem I guess :)

AFAICT during memory_failure, the refcount will not be cleared for poisoned page in order to keep it away from free buddy pages.
So move that chunk after the folio_try_get() do seems nice and poisoned free pages do seems weird to me.

> 



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/2] mm: memory_hotplug: add TTU_HWPOISON for poisoned folio during migrate
  2025-01-14  3:37     ` mawupeng
@ 2025-01-14  9:35       ` David Hildenbrand
  0 siblings, 0 replies; 10+ messages in thread
From: David Hildenbrand @ 2025-01-14  9:35 UTC (permalink / raw)
  To: mawupeng, akpm, osalvador, nao.horiguchi, linmiaohe, mhocko
  Cc: linux-mm, linux-kernel

> We have faced a different race here as follow:
> 
> CPU#0			cpu #1
> 			offline_pages
> 			do_migrate_range
> memory_failure
> TestSetPageHWPoison
> 			// folio is mark poison here
> 			unmap_poisoned_folio // should kill task here
> ...
> hwpoison_user_mappings

Interesting; TestSetPageHWPoison is called before we grab the folio 
lock, so even grabbing that around unmap_poisoned_folio() cannot 
completely help.

-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 2/2] hwpoison, memory_hotplug: lock folio before unmap hwpoisoned folio
  2025-01-14  9:08     ` mawupeng
@ 2025-01-14  9:40       ` David Hildenbrand
  0 siblings, 0 replies; 10+ messages in thread
From: David Hildenbrand @ 2025-01-14  9:40 UTC (permalink / raw)
  To: mawupeng, akpm, osalvador, nao.horiguchi, linmiaohe, mhocko
  Cc: linux-mm, linux-kernel

On 14.01.25 10:08, mawupeng wrote:
> 
> 
> On 2025/1/13 20:35, David Hildenbrand wrote:
>> On 13.01.25 09:27, Wupeng Ma wrote:
>>> From: Ma Wupeng <mawupeng1@huawei.com>
>>>
>>> Commit b15c87263a69 ("hwpoison, memory_hotplug: allow hwpoisoned pages to
>>> be offlined) add page poison checks in do_migrate_range in order to make
>>> offline hwpoisoned page possible by introducing isolate_lru_page and
>>> try_to_unmap for hwpoisoned page. However folio lock must be held before
>>> calling try_to_unmap. Add it to fix this problem.
>>>
>>> Waring will be produced if filio is not locked during unmap:
>>>
>>>     ------------[ cut here ]------------
>>>     kernel BUG at ./include/linux/swapops.h:400!
>>>     Internal error: Oops - BUG: 00000000f2000800 [#1] PREEMPT SMP
>>>     Modules linked in:
>>>     CPU: 4 UID: 0 PID: 411 Comm: bash Tainted: G        W          6.13.0-rc1-00016-g3c434c7ee82a-dirty #41
>>>     Tainted: [W]=WARN
>>>     Hardware name: QEMU QEMU Virtual Machine, BIOS 0.0.0 02/06/2015
>>>     pstate: 40400005 (nZcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
>>>     pc : try_to_unmap_one+0xb08/0xd3c
>>>     lr : try_to_unmap_one+0x3dc/0xd3c
>>>     Call trace:
>>>      try_to_unmap_one+0xb08/0xd3c (P)
>>>      try_to_unmap_one+0x3dc/0xd3c (L)
>>>      rmap_walk_anon+0xdc/0x1f8
>>>      rmap_walk+0x3c/0x58
>>>      try_to_unmap+0x88/0x90
>>>      unmap_poisoned_folio+0x30/0xa8
>>>      do_migrate_range+0x4a0/0x568
>>>      offline_pages+0x5a4/0x670
>>>      memory_block_action+0x17c/0x374
>>>      memory_subsys_offline+0x3c/0x78
>>>      device_offline+0xa4/0xd0
>>>      state_store+0x8c/0xf0
>>>      dev_attr_store+0x18/0x2c
>>>      sysfs_kf_write+0x44/0x54
>>>      kernfs_fop_write_iter+0x118/0x1a8
>>>      vfs_write+0x3a8/0x4bc
>>>      ksys_write+0x6c/0xf8
>>>      __arm64_sys_write+0x1c/0x28
>>>      invoke_syscall+0x44/0x100
>>>      el0_svc_common.constprop.0+0x40/0xe0
>>>      do_el0_svc+0x1c/0x28
>>>      el0_svc+0x30/0xd0
>>>      el0t_64_sync_handler+0xc8/0xcc
>>>      el0t_64_sync+0x198/0x19c
>>>     Code: f9407be0 b5fff320 d4210000 17ffff97 (d4210000)
>>>     ---[ end trace 0000000000000000 ]---
>>>
>>> Fixes: b15c87263a69 ("hwpoison, memory_hotplug: allow hwpoisoned pages to be offlined")
>>> Signed-off-by: Ma Wupeng <mawupeng1@huawei.com>
>>> ---
>>>    mm/memory_hotplug.c | 6 +++++-
>>>    1 file changed, 5 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
>>> index 330668d37e44..9bedecfc3577 100644
>>> --- a/mm/memory_hotplug.c
>>> +++ b/mm/memory_hotplug.c
>>> @@ -1805,8 +1805,12 @@ static void do_migrate_range(unsigned long start_pfn, unsigned long end_pfn)
>>>                (folio_test_large(folio) && folio_test_has_hwpoisoned(folio))) {
>>>                if (WARN_ON(folio_test_lru(folio)))
>>>                    folio_isolate_lru(folio);
>>> -            if (folio_mapped(folio))
>>> +            if (folio_mapped(folio)) {
>>> +                folio_lock(folio);
>>>                    unmap_poisoned_folio(folio, TTU_IGNORE_MLOCK | TTU_HWPOISON);
>>> +                folio_unlock(folio);
>>> +            }
>>
>> The comment above says "have elevated reference counts", but I I wonder if this code could race with un-poisoning (although probably a rare event).
>>
>>
>> If there is an elevated reference already, why not move that chunk after the folio_try_get() and just drop the comment that describes the implied magic?
>>
>> I mean, migration of hwpoison is dangerous either way, so that's not the biggest problem I guess :)
> 
> AFAICT during memory_failure, the refcount will not be cleared for poisoned page in order to keep it away from free buddy pages.
> So move that chunk after the folio_try_get() do seems nice and poisoned free pages do seems weird to me.

If the "free" poisoned pages have a raised refcount, folio_try_get() 
would be able to grab them to essentially skip them (not mapped).

If the "free" poisoned pages don't have a raised refcount (impossible 
right now IIRC), folio_try_get() would skip, them, which would be the 
right thing to do.

So I think that should work.

-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 2/2] hwpoison, memory_hotplug: lock folio before unmap hwpoisoned folio
  2025-01-13 12:35   ` David Hildenbrand
  2025-01-14  9:08     ` mawupeng
@ 2025-01-15  2:03     ` Miaohe Lin
  1 sibling, 0 replies; 10+ messages in thread
From: Miaohe Lin @ 2025-01-15  2:03 UTC (permalink / raw)
  To: David Hildenbrand, Wupeng Ma
  Cc: linux-mm, linux-kernel, akpm, osalvador, nao.horiguchi, mhocko

On 2025/1/13 20:35, David Hildenbrand wrote:
> On 13.01.25 09:27, Wupeng Ma wrote:
>> From: Ma Wupeng <mawupeng1@huawei.com>
>>
>> Commit b15c87263a69 ("hwpoison, memory_hotplug: allow hwpoisoned pages to
>> be offlined) add page poison checks in do_migrate_range in order to make
>> offline hwpoisoned page possible by introducing isolate_lru_page and
>> try_to_unmap for hwpoisoned page. However folio lock must be held before
>> calling try_to_unmap. Add it to fix this problem.
>>
>> Waring will be produced if filio is not locked during unmap:
>>
>>    ------------[ cut here ]------------
>>    kernel BUG at ./include/linux/swapops.h:400!
>>    Internal error: Oops - BUG: 00000000f2000800 [#1] PREEMPT SMP
>>    Modules linked in:
>>    CPU: 4 UID: 0 PID: 411 Comm: bash Tainted: G        W          6.13.0-rc1-00016-g3c434c7ee82a-dirty #41
>>    Tainted: [W]=WARN
>>    Hardware name: QEMU QEMU Virtual Machine, BIOS 0.0.0 02/06/2015
>>    pstate: 40400005 (nZcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
>>    pc : try_to_unmap_one+0xb08/0xd3c
>>    lr : try_to_unmap_one+0x3dc/0xd3c
>>    Call trace:
>>     try_to_unmap_one+0xb08/0xd3c (P)
>>     try_to_unmap_one+0x3dc/0xd3c (L)
>>     rmap_walk_anon+0xdc/0x1f8
>>     rmap_walk+0x3c/0x58
>>     try_to_unmap+0x88/0x90
>>     unmap_poisoned_folio+0x30/0xa8
>>     do_migrate_range+0x4a0/0x568
>>     offline_pages+0x5a4/0x670
>>     memory_block_action+0x17c/0x374
>>     memory_subsys_offline+0x3c/0x78
>>     device_offline+0xa4/0xd0
>>     state_store+0x8c/0xf0
>>     dev_attr_store+0x18/0x2c
>>     sysfs_kf_write+0x44/0x54
>>     kernfs_fop_write_iter+0x118/0x1a8
>>     vfs_write+0x3a8/0x4bc
>>     ksys_write+0x6c/0xf8
>>     __arm64_sys_write+0x1c/0x28
>>     invoke_syscall+0x44/0x100
>>     el0_svc_common.constprop.0+0x40/0xe0
>>     do_el0_svc+0x1c/0x28
>>     el0_svc+0x30/0xd0
>>     el0t_64_sync_handler+0xc8/0xcc
>>     el0t_64_sync+0x198/0x19c
>>    Code: f9407be0 b5fff320 d4210000 17ffff97 (d4210000)
>>    ---[ end trace 0000000000000000 ]---
>>
>> Fixes: b15c87263a69 ("hwpoison, memory_hotplug: allow hwpoisoned pages to be offlined")
>> Signed-off-by: Ma Wupeng <mawupeng1@huawei.com>
>> ---
>>   mm/memory_hotplug.c | 6 +++++-
>>   1 file changed, 5 insertions(+), 1 deletion(-)
>>
>> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
>> index 330668d37e44..9bedecfc3577 100644
>> --- a/mm/memory_hotplug.c
>> +++ b/mm/memory_hotplug.c
>> @@ -1805,8 +1805,12 @@ static void do_migrate_range(unsigned long start_pfn, unsigned long end_pfn)
>>               (folio_test_large(folio) && folio_test_has_hwpoisoned(folio))) {
>>               if (WARN_ON(folio_test_lru(folio)))
>>                   folio_isolate_lru(folio);
>> -            if (folio_mapped(folio))
>> +            if (folio_mapped(folio)) {
>> +                folio_lock(folio);
>>                   unmap_poisoned_folio(folio, TTU_IGNORE_MLOCK | TTU_HWPOISON);
>> +                folio_unlock(folio);
>> +            }
> 
> The comment above says "have elevated reference counts", but I I wonder if this code could race with un-poisoning (although probably a rare event).

AFAICS, this will open the race window with unpoisoning:

unpoison_memory		do_migrate_range
			  folio_try_get
  if (folio_ref_count(folio) > 1)
    return -EBUSY;

But this should be benign one and userspace can simply retry later.

Thanks.
.


^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2025-01-15  2:03 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-01-13  8:27 [PATCH 0/2] mm: memory_failure: unmap poisoned filio during migrate properly Wupeng Ma
2025-01-13  8:27 ` [PATCH 1/2] mm: memory_hotplug: add TTU_HWPOISON for poisoned folio during migrate Wupeng Ma
2025-01-13 13:02   ` David Hildenbrand
2025-01-14  3:37     ` mawupeng
2025-01-14  9:35       ` David Hildenbrand
2025-01-13  8:27 ` [PATCH 2/2] hwpoison, memory_hotplug: lock folio before unmap hwpoisoned folio Wupeng Ma
2025-01-13 12:35   ` David Hildenbrand
2025-01-14  9:08     ` mawupeng
2025-01-14  9:40       ` David Hildenbrand
2025-01-15  2:03     ` Miaohe Lin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox