linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 0/5] mm: memory_hotplug: improve do_migrate_range()
@ 2024-08-27 11:47 Kefeng Wang
  2024-08-27 11:47 ` [PATCH v3 1/5] mm: memory_hotplug: remove head variable in do_migrate_range() Kefeng Wang
                   ` (4 more replies)
  0 siblings, 5 replies; 20+ messages in thread
From: Kefeng Wang @ 2024-08-27 11:47 UTC (permalink / raw)
  To: Andrew Morton
  Cc: David Hildenbrand, Oscar Salvador, Miaohe Lin, Naoya Horiguchi,
	linux-mm, dan.carpenter, Jonathan Cameron, Kefeng Wang

Unify hwpoisoned page handling and isolation of HugeTLB/LRU/non-LRU
movable page, also convert to use folios in do_migrate_range().

---
Hi Andrew, since David and Miaohe give some more comments, no big
changes, but need to update, so I revert v2+fix on and rebased on
next-20240826 and resend, hope this is final version, please use
this v3 if no more comments, thanks.

v3:
- update comment/typo in patch1, Per Jonathan/Miaohe
- remove return value of unmap_posioned_folio() and print inside when
  could not lock mapping for mapped huge page, per Miaohe
- drop incorrect KSM comments in do_migrate_range()
- refactor isolate_folio_to_list(), per David 
- fix folio_put() in soft_offline_in_use_page(), per Miaohe
- use hugetlb instead of huge in patch5, per David 
- collect ACK/RB

v2-resend:
- fix isolate_hugetlb() build error in patch1

v2:
- address comments from David(eg, fix HWPoison check/use a folio
  for pfn calculation firstly)
- fix lkp build errors for isolate_folio_to_list()
- drop unnecessary comments and don't grab one more ref for hugetlb 

Kefeng Wang (5):
  mm: memory_hotplug: remove head variable in do_migrate_range()
  mm: memory-failure: add unmap_poisoned_folio()
  mm: memory_hotplug: check hwpoisoned page firstly in
    do_migrate_range()
  mm: migrate: add isolate_folio_to_list()
  mm: memory_hotplug: unify Huge/LRU/non-LRU movable folio isolation

 include/linux/migrate.h |  3 ++
 mm/internal.h           |  8 ++++
 mm/memory-failure.c     | 91 ++++++++++++++++++-----------------------
 mm/memory_hotplug.c     | 65 ++++++++++++++---------------
 mm/migrate.c            | 26 ++++++++++++
 5 files changed, 106 insertions(+), 87 deletions(-)

-- 
2.27.0



^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH v3 1/5] mm: memory_hotplug: remove head variable in do_migrate_range()
  2024-08-27 11:47 [PATCH v3 0/5] mm: memory_hotplug: improve do_migrate_range() Kefeng Wang
@ 2024-08-27 11:47 ` Kefeng Wang
  2024-09-28  4:55   ` Matthew Wilcox
  2024-08-27 11:47 ` [PATCH v3 2/5] mm: memory-failure: add unmap_poisoned_folio() Kefeng Wang
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 20+ messages in thread
From: Kefeng Wang @ 2024-08-27 11:47 UTC (permalink / raw)
  To: Andrew Morton
  Cc: David Hildenbrand, Oscar Salvador, Miaohe Lin, Naoya Horiguchi,
	linux-mm, dan.carpenter, Jonathan Cameron, Kefeng Wang

Directly use a folio for HugeTLB and THP when calculate the next pfn, then
remove unused head variable.

Reviewed-by: Miaohe Lin <linmiaohe@huawei.com>
Acked-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
---
 mm/memory_hotplug.c | 22 ++++++++++++++--------
 1 file changed, 14 insertions(+), 8 deletions(-)

diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 38ecd45c3f94..9ef776a25b9d 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1773,7 +1773,7 @@ static int scan_movable_pages(unsigned long start, unsigned long end,
 static void do_migrate_range(unsigned long start_pfn, unsigned long end_pfn)
 {
 	unsigned long pfn;
-	struct page *page, *head;
+	struct page *page;
 	LIST_HEAD(source);
 	static DEFINE_RATELIMIT_STATE(migrate_rs, DEFAULT_RATELIMIT_INTERVAL,
 				      DEFAULT_RATELIMIT_BURST);
@@ -1786,14 +1786,20 @@ static void do_migrate_range(unsigned long start_pfn, unsigned long end_pfn)
 			continue;
 		page = pfn_to_page(pfn);
 		folio = page_folio(page);
-		head = &folio->page;
 
-		if (PageHuge(page)) {
-			pfn = page_to_pfn(head) + compound_nr(head) - 1;
-			isolate_hugetlb(folio, &source);
-			continue;
-		} else if (PageTransHuge(page))
-			pfn = page_to_pfn(head) + thp_nr_pages(page) - 1;
+		/*
+		 * No reference or lock is held on the folio, so it might
+		 * be modified concurrently (e.g. split).  As such,
+		 * folio_nr_pages() may read garbage.  This is fine as the outer
+		 * loop will revisit the split folio later.
+		 */
+		if (folio_test_large(folio)) {
+			pfn = folio_pfn(folio) + folio_nr_pages(folio) - 1;
+			if (folio_test_hugetlb(folio)) {
+				isolate_hugetlb(folio, &source);
+				continue;
+			}
+		}
 
 		/*
 		 * HWPoison pages have elevated reference counts so the migration would
-- 
2.27.0



^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH v3 2/5] mm: memory-failure: add unmap_poisoned_folio()
  2024-08-27 11:47 [PATCH v3 0/5] mm: memory_hotplug: improve do_migrate_range() Kefeng Wang
  2024-08-27 11:47 ` [PATCH v3 1/5] mm: memory_hotplug: remove head variable in do_migrate_range() Kefeng Wang
@ 2024-08-27 11:47 ` Kefeng Wang
  2024-08-31  8:16   ` Miaohe Lin
  2024-08-27 11:47 ` [PATCH v3 3/5] mm: memory_hotplug: check hwpoisoned page firstly in do_migrate_range() Kefeng Wang
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 20+ messages in thread
From: Kefeng Wang @ 2024-08-27 11:47 UTC (permalink / raw)
  To: Andrew Morton
  Cc: David Hildenbrand, Oscar Salvador, Miaohe Lin, Naoya Horiguchi,
	linux-mm, dan.carpenter, Jonathan Cameron, Kefeng Wang

Add unmap_poisoned_folio() helper which will be reused by
do_migrate_range() from memory hotplug soon.

Acked-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
---
 mm/internal.h       |  8 ++++++++
 mm/memory-failure.c | 43 ++++++++++++++++++++++++++-----------------
 2 files changed, 34 insertions(+), 17 deletions(-)

diff --git a/mm/internal.h b/mm/internal.h
index 5e6f2abcea28..b00ea4595d18 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -1048,6 +1048,8 @@ static inline int find_next_best_node(int node, nodemask_t *used_node_mask)
 /*
  * mm/memory-failure.c
  */
+#ifdef CONFIG_MEMORY_FAILURE
+void unmap_poisoned_folio(struct folio *folio, enum ttu_flags ttu);
 void shake_folio(struct folio *folio);
 extern int hwpoison_filter(struct page *p);
 
@@ -1068,6 +1070,12 @@ void add_to_kill_ksm(struct task_struct *tsk, struct page *p,
 		     unsigned long ksm_addr);
 unsigned long page_mapped_in_vma(struct page *page, struct vm_area_struct *vma);
 
+#else
+static inline void unmap_poisoned_folio(struct folio *folio, enum ttu_flags ttu)
+{
+}
+#endif
+
 extern unsigned long  __must_check vm_mmap_pgoff(struct file *, unsigned long,
         unsigned long, unsigned long,
         unsigned long, unsigned long);
diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index 353254537b54..67b6b259a75d 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -1554,6 +1554,31 @@ static int get_hwpoison_page(struct page *p, unsigned long flags)
 	return ret;
 }
 
+void unmap_poisoned_folio(struct folio *folio, enum ttu_flags ttu)
+{
+	if (folio_test_hugetlb(folio) && !folio_test_anon(folio)) {
+		struct address_space *mapping;
+		/*
+		 * For hugetlb folios in shared mappings, try_to_unmap
+		 * could potentially call huge_pmd_unshare.  Because of
+		 * this, take semaphore in write mode here and set
+		 * TTU_RMAP_LOCKED to indicate we have taken the lock
+		 * at this higher level.
+		 */
+		mapping = hugetlb_folio_mapping_lock_write(folio);
+		if (!mapping) {
+			pr_info("%#lx: could not lock mapping for mapped hugetlb folio\n",
+				folio_pfn(folio));
+			return;
+		}
+
+		try_to_unmap(folio, ttu|TTU_RMAP_LOCKED);
+		i_mmap_unlock_write(mapping);
+	} else {
+		try_to_unmap(folio, ttu);
+	}
+}
+
 /*
  * Do all that is necessary to remove user space mappings. Unmap
  * the pages and send SIGBUS to the processes if the data was dirty.
@@ -1615,23 +1640,7 @@ static bool hwpoison_user_mappings(struct folio *folio, struct page *p,
 	 */
 	collect_procs(folio, p, &tokill, flags & MF_ACTION_REQUIRED);
 
-	if (folio_test_hugetlb(folio) && !folio_test_anon(folio)) {
-		/*
-		 * For hugetlb pages in shared mappings, try_to_unmap
-		 * could potentially call huge_pmd_unshare.  Because of
-		 * this, take semaphore in write mode here and set
-		 * TTU_RMAP_LOCKED to indicate we have taken the lock
-		 * at this higher level.
-		 */
-		mapping = hugetlb_folio_mapping_lock_write(folio);
-		if (mapping) {
-			try_to_unmap(folio, ttu|TTU_RMAP_LOCKED);
-			i_mmap_unlock_write(mapping);
-		} else
-			pr_info("%#lx: could not lock mapping for mapped huge page\n", pfn);
-	} else {
-		try_to_unmap(folio, ttu);
-	}
+	unmap_poisoned_folio(folio, ttu);
 
 	unmap_success = !folio_mapped(folio);
 	if (!unmap_success)
-- 
2.27.0



^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH v3 3/5] mm: memory_hotplug: check hwpoisoned page firstly in do_migrate_range()
  2024-08-27 11:47 [PATCH v3 0/5] mm: memory_hotplug: improve do_migrate_range() Kefeng Wang
  2024-08-27 11:47 ` [PATCH v3 1/5] mm: memory_hotplug: remove head variable in do_migrate_range() Kefeng Wang
  2024-08-27 11:47 ` [PATCH v3 2/5] mm: memory-failure: add unmap_poisoned_folio() Kefeng Wang
@ 2024-08-27 11:47 ` Kefeng Wang
  2024-08-31  8:36   ` Miaohe Lin
  2024-08-27 11:47 ` [PATCH v3 4/5] mm: migrate: add isolate_folio_to_list() Kefeng Wang
  2024-08-27 11:47 ` [PATCH v3 5/5] mm: memory_hotplug: unify Huge/LRU/non-LRU movable folio isolation Kefeng Wang
  4 siblings, 1 reply; 20+ messages in thread
From: Kefeng Wang @ 2024-08-27 11:47 UTC (permalink / raw)
  To: Andrew Morton
  Cc: David Hildenbrand, Oscar Salvador, Miaohe Lin, Naoya Horiguchi,
	linux-mm, dan.carpenter, Jonathan Cameron, Kefeng Wang

The commit b15c87263a69 ("hwpoison, memory_hotplug: allow hwpoisoned
pages to be offlined") don't handle the hugetlb pages, the endless loop
still occur if offline a hwpoison hugetlb page, luckly, after the
commit e591ef7d96d6 ("mm, hwpoison,hugetlb,memory_hotplug: hotremove
memory section with hwpoisoned hugepage"), the HPageMigratable of hugetlb
page will be cleared, and the hwpoison hugetlb page will be skipped in
scan_movable_pages(), so the endless loop issue is fixed.

However if the HPageMigratable() check passed(without reference and lock),
the hugetlb page may be hwpoisoned, it won't cause issue since the
hwpoisoned page will be handled correctly in the next movable pages scan
loop, and it will be isolated in do_migrate_range() but fails to migrate.
In order to avoid the unnecessary isolation and unify all hwpoisoned page
handling, let's unconditionally check hwpoison firstly, and if it is a
hwpoisoned hugetlb page, try to unmap it as the catch all safety net like
normal page does.

Acked-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
---
 mm/memory_hotplug.c | 20 ++++++++++----------
 1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 9ef776a25b9d..1335fb6ef7fa 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1793,26 +1793,26 @@ static void do_migrate_range(unsigned long start_pfn, unsigned long end_pfn)
 		 * folio_nr_pages() may read garbage.  This is fine as the outer
 		 * loop will revisit the split folio later.
 		 */
-		if (folio_test_large(folio)) {
+		if (folio_test_large(folio))
 			pfn = folio_pfn(folio) + folio_nr_pages(folio) - 1;
-			if (folio_test_hugetlb(folio)) {
-				isolate_hugetlb(folio, &source);
-				continue;
-			}
-		}
 
 		/*
 		 * HWPoison pages have elevated reference counts so the migration would
 		 * fail on them. It also doesn't make any sense to migrate them in the
 		 * first place. Still try to unmap such a page in case it is still mapped
-		 * (e.g. current hwpoison implementation doesn't unmap KSM pages but keep
-		 * the unmap as the catch all safety net).
+		 * (keep the unmap as the catch all safety net).
 		 */
-		if (PageHWPoison(page)) {
+		if (folio_test_hwpoison(folio) ||
+		    (folio_test_large(folio) && folio_test_has_hwpoisoned(folio))) {
 			if (WARN_ON(folio_test_lru(folio)))
 				folio_isolate_lru(folio);
 			if (folio_mapped(folio))
-				try_to_unmap(folio, TTU_IGNORE_MLOCK);
+				unmap_poisoned_folio(folio, TTU_IGNORE_MLOCK);
+			continue;
+		}
+
+		if (folio_test_hugetlb(folio)) {
+			isolate_hugetlb(folio, &source);
 			continue;
 		}
 
-- 
2.27.0



^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH v3 4/5] mm: migrate: add isolate_folio_to_list()
  2024-08-27 11:47 [PATCH v3 0/5] mm: memory_hotplug: improve do_migrate_range() Kefeng Wang
                   ` (2 preceding siblings ...)
  2024-08-27 11:47 ` [PATCH v3 3/5] mm: memory_hotplug: check hwpoisoned page firstly in do_migrate_range() Kefeng Wang
@ 2024-08-27 11:47 ` Kefeng Wang
  2024-08-27 11:47 ` [PATCH v3 5/5] mm: memory_hotplug: unify Huge/LRU/non-LRU movable folio isolation Kefeng Wang
  4 siblings, 0 replies; 20+ messages in thread
From: Kefeng Wang @ 2024-08-27 11:47 UTC (permalink / raw)
  To: Andrew Morton
  Cc: David Hildenbrand, Oscar Salvador, Miaohe Lin, Naoya Horiguchi,
	linux-mm, dan.carpenter, Jonathan Cameron, Kefeng Wang

Add isolate_folio_to_list() helper to try to isolate HugeTLB,
no-LRU movable and LRU folios to a list, which will be reused
by do_migrate_range() from memory hotplug soon, also drop the
mf_isolate_folio() since we could directly use new helper in
the soft_offline_in_use_page().

Acked-by: David Hildenbrand <david@redhat.com>
Acked-by: Miaohe Lin <linmiaohe@huawei.com>
Tested-by: Miaohe Lin <linmiaohe@huawei.com>
Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
---
 include/linux/migrate.h |  3 +++
 mm/memory-failure.c     | 48 +++++++++++------------------------------
 mm/migrate.c            | 26 ++++++++++++++++++++++
 3 files changed, 42 insertions(+), 35 deletions(-)

diff --git a/include/linux/migrate.h b/include/linux/migrate.h
index 644be30b69c8..002e49b2ebd9 100644
--- a/include/linux/migrate.h
+++ b/include/linux/migrate.h
@@ -70,6 +70,7 @@ int migrate_pages(struct list_head *l, new_folio_t new, free_folio_t free,
 		  unsigned int *ret_succeeded);
 struct folio *alloc_migration_target(struct folio *src, unsigned long private);
 bool isolate_movable_page(struct page *page, isolate_mode_t mode);
+bool isolate_folio_to_list(struct folio *folio, struct list_head *list);
 
 int migrate_huge_page_move_mapping(struct address_space *mapping,
 		struct folio *dst, struct folio *src);
@@ -91,6 +92,8 @@ static inline struct folio *alloc_migration_target(struct folio *src,
 	{ return NULL; }
 static inline bool isolate_movable_page(struct page *page, isolate_mode_t mode)
 	{ return false; }
+static inline bool isolate_folio_to_list(struct folio *folio, struct list_head *list)
+	{ return false; }
 
 static inline int migrate_huge_page_move_mapping(struct address_space *mapping,
 				  struct folio *dst, struct folio *src)
diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index 67b6b259a75d..7da3697b33f1 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -2659,40 +2659,6 @@ EXPORT_SYMBOL(unpoison_memory);
 #undef pr_fmt
 #define pr_fmt(fmt) "Soft offline: " fmt
 
-static bool mf_isolate_folio(struct folio *folio, struct list_head *pagelist)
-{
-	bool isolated = false;
-
-	if (folio_test_hugetlb(folio)) {
-		isolated = isolate_hugetlb(folio, pagelist);
-	} else {
-		bool lru = !__folio_test_movable(folio);
-
-		if (lru)
-			isolated = folio_isolate_lru(folio);
-		else
-			isolated = isolate_movable_page(&folio->page,
-							ISOLATE_UNEVICTABLE);
-
-		if (isolated) {
-			list_add(&folio->lru, pagelist);
-			if (lru)
-				node_stat_add_folio(folio, NR_ISOLATED_ANON +
-						    folio_is_file_lru(folio));
-		}
-	}
-
-	/*
-	 * If we succeed to isolate the folio, we grabbed another refcount on
-	 * the folio, so we can safely drop the one we got from get_any_page().
-	 * If we failed to isolate the folio, it means that we cannot go further
-	 * and we will return an error, so drop the reference we got from
-	 * get_any_page() as well.
-	 */
-	folio_put(folio);
-	return isolated;
-}
-
 /*
  * soft_offline_in_use_page handles hugetlb-pages and non-hugetlb pages.
  * If the page is a non-dirty unmapped page-cache page, it simply invalidates.
@@ -2705,6 +2671,7 @@ static int soft_offline_in_use_page(struct page *page)
 	struct folio *folio = page_folio(page);
 	char const *msg_page[] = {"page", "hugepage"};
 	bool huge = folio_test_hugetlb(folio);
+	bool isolated;
 	LIST_HEAD(pagelist);
 	struct migration_target_control mtc = {
 		.nid = NUMA_NO_NODE,
@@ -2744,7 +2711,18 @@ static int soft_offline_in_use_page(struct page *page)
 		return 0;
 	}
 
-	if (mf_isolate_folio(folio, &pagelist)) {
+	isolated = isolate_folio_to_list(folio, &pagelist);
+
+	/*
+	 * If we succeed to isolate the folio, we grabbed another refcount on
+	 * the folio, so we can safely drop the one we got from get_any_page().
+	 * If we failed to isolate the folio, it means that we cannot go further
+	 * and we will return an error, so drop the reference we got from
+	 * get_any_page() as well.
+	 */
+	folio_put(folio);
+
+	if (isolated) {
 		ret = migrate_pages(&pagelist, alloc_migration_target, NULL,
 			(unsigned long)&mtc, MIGRATE_SYNC, MR_MEMORY_FAILURE, NULL);
 		if (!ret) {
diff --git a/mm/migrate.c b/mm/migrate.c
index f8777d6fab57..6f9c62c746be 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -178,6 +178,32 @@ void putback_movable_pages(struct list_head *l)
 	}
 }
 
+/* Must be called with an elevated refcount on the non-hugetlb folio */
+bool isolate_folio_to_list(struct folio *folio, struct list_head *list)
+{
+	bool isolated, lru;
+
+	if (folio_test_hugetlb(folio))
+		return isolate_hugetlb(folio, list);
+
+	lru = !__folio_test_movable(folio);
+	if (lru)
+		isolated = folio_isolate_lru(folio);
+	else
+		isolated = isolate_movable_page(&folio->page,
+						ISOLATE_UNEVICTABLE);
+
+	if (!isolated)
+		return false;
+
+	list_add(&folio->lru, list);
+	if (lru)
+		node_stat_add_folio(folio, NR_ISOLATED_ANON +
+				    folio_is_file_lru(folio));
+
+	return true;
+}
+
 /*
  * Restore a potential migration pte to a working pte entry
  */
-- 
2.27.0



^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH v3 5/5] mm: memory_hotplug: unify Huge/LRU/non-LRU movable folio isolation
  2024-08-27 11:47 [PATCH v3 0/5] mm: memory_hotplug: improve do_migrate_range() Kefeng Wang
                   ` (3 preceding siblings ...)
  2024-08-27 11:47 ` [PATCH v3 4/5] mm: migrate: add isolate_folio_to_list() Kefeng Wang
@ 2024-08-27 11:47 ` Kefeng Wang
  2024-08-29 15:05   ` [PATCH v3 5-fix/5] mm: memory_hotplug: unify Huge/LRU/non-LRU movable folio isolation fix Kefeng Wang
  2024-08-31  9:01   ` [PATCH v3 5/5] mm: memory_hotplug: unify Huge/LRU/non-LRU movable folio isolation Miaohe Lin
  4 siblings, 2 replies; 20+ messages in thread
From: Kefeng Wang @ 2024-08-27 11:47 UTC (permalink / raw)
  To: Andrew Morton
  Cc: David Hildenbrand, Oscar Salvador, Miaohe Lin, Naoya Horiguchi,
	linux-mm, dan.carpenter, Jonathan Cameron, Kefeng Wang

Use the isolate_folio_to_list() to unify hugetlb/LRU/non-LRU folio
isolation, which cleanup code a bit and save a few calls to
compound_head().

Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
---
 mm/memory_hotplug.c | 45 +++++++++++++++++----------------------------
 1 file changed, 17 insertions(+), 28 deletions(-)

diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 1335fb6ef7fa..5f09866d17cf 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1772,15 +1772,15 @@ static int scan_movable_pages(unsigned long start, unsigned long end,
 
 static void do_migrate_range(unsigned long start_pfn, unsigned long end_pfn)
 {
+	struct folio *folio;
 	unsigned long pfn;
-	struct page *page;
 	LIST_HEAD(source);
 	static DEFINE_RATELIMIT_STATE(migrate_rs, DEFAULT_RATELIMIT_INTERVAL,
 				      DEFAULT_RATELIMIT_BURST);
 
 	for (pfn = start_pfn; pfn < end_pfn; pfn++) {
-		struct folio *folio;
-		bool isolated;
+		struct page *page;
+		bool hugetlb;
 
 		if (!pfn_valid(pfn))
 			continue;
@@ -1811,34 +1811,22 @@ static void do_migrate_range(unsigned long start_pfn, unsigned long end_pfn)
 			continue;
 		}
 
-		if (folio_test_hugetlb(folio)) {
-			isolate_hugetlb(folio, &source);
-			continue;
+		hugetlb = folio_test_hugetlb(folio);
+		if (!hugetlb) {
+			folio = folio_get_nontail_page(page);
+			if (!folio)
+				continue;
 		}
 
-		if (!get_page_unless_zero(page))
-			continue;
-		/*
-		 * We can skip free pages. And we can deal with pages on
-		 * LRU and non-lru movable pages.
-		 */
-		if (PageLRU(page))
-			isolated = isolate_lru_page(page);
-		else
-			isolated = isolate_movable_page(page, ISOLATE_UNEVICTABLE);
-		if (isolated) {
-			list_add_tail(&page->lru, &source);
-			if (!__PageMovable(page))
-				inc_node_page_state(page, NR_ISOLATED_ANON +
-						    page_is_file_lru(page));
-
-		} else {
+		if (!isolate_folio_to_list(folio, &source)) {
 			if (__ratelimit(&migrate_rs)) {
 				pr_warn("failed to isolate pfn %lx\n", pfn);
 				dump_page(page, "isolation failed");
 			}
 		}
-		put_page(page);
+
+		if (!hugetlb)
+			folio_put(folio);
 	}
 	if (!list_empty(&source)) {
 		nodemask_t nmask = node_states[N_MEMORY];
@@ -1853,7 +1841,7 @@ static void do_migrate_range(unsigned long start_pfn, unsigned long end_pfn)
 		 * We have checked that migration range is on a single zone so
 		 * we can use the nid of the first page to all the others.
 		 */
-		mtc.nid = page_to_nid(list_first_entry(&source, struct page, lru));
+		mtc.nid = folio_nid(list_first_entry(&source, struct folio, lru));
 
 		/*
 		 * try to allocate from a different node but reuse this node
@@ -1866,11 +1854,12 @@ static void do_migrate_range(unsigned long start_pfn, unsigned long end_pfn)
 		ret = migrate_pages(&source, alloc_migration_target, NULL,
 			(unsigned long)&mtc, MIGRATE_SYNC, MR_MEMORY_HOTPLUG, NULL);
 		if (ret) {
-			list_for_each_entry(page, &source, lru) {
+			list_for_each_entry(folio, &source, lru) {
 				if (__ratelimit(&migrate_rs)) {
 					pr_warn("migrating pfn %lx failed ret:%d\n",
-						page_to_pfn(page), ret);
-					dump_page(page, "migration failure");
+						folio_pfn(folio), ret);
+					dump_page(&folio->page,
+						  "migration failure");
 				}
 			}
 			putback_movable_pages(&source);
-- 
2.27.0



^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH v3 5-fix/5] mm: memory_hotplug: unify Huge/LRU/non-LRU movable folio isolation fix
  2024-08-27 11:47 ` [PATCH v3 5/5] mm: memory_hotplug: unify Huge/LRU/non-LRU movable folio isolation Kefeng Wang
@ 2024-08-29 15:05   ` Kefeng Wang
  2024-08-29 15:19     ` David Hildenbrand
  2024-08-31  9:01   ` [PATCH v3 5/5] mm: memory_hotplug: unify Huge/LRU/non-LRU movable folio isolation Miaohe Lin
  1 sibling, 1 reply; 20+ messages in thread
From: Kefeng Wang @ 2024-08-29 15:05 UTC (permalink / raw)
  To: Andrew Morton
  Cc: David Hildenbrand, Oscar Salvador, Miaohe Lin, Naoya Horiguchi,
	linux-mm, Kefeng Wang

  - Use folio_try_get()/folio_put() unconditionally, per David
  - print current pfn when isolate_folio_to_list fails.
  - memory hotremove test passed for free/used hugetlb folio

Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
---
 mm/memory_hotplug.c | 20 +++++++++-----------
 1 file changed, 9 insertions(+), 11 deletions(-)

diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 5f09866d17cf..621ae1015106 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1780,7 +1780,6 @@ static void do_migrate_range(unsigned long start_pfn, unsigned long end_pfn)
 
 	for (pfn = start_pfn; pfn < end_pfn; pfn++) {
 		struct page *page;
-		bool hugetlb;
 
 		if (!pfn_valid(pfn))
 			continue;
@@ -1811,22 +1810,21 @@ static void do_migrate_range(unsigned long start_pfn, unsigned long end_pfn)
 			continue;
 		}
 
-		hugetlb = folio_test_hugetlb(folio);
-		if (!hugetlb) {
-			folio = folio_get_nontail_page(page);
-			if (!folio)
-				continue;
-		}
+		if (!folio_try_get(folio))
+			continue;
+
+		if (unlikely(page_folio(page) != folio))
+			goto put_folio;
 
 		if (!isolate_folio_to_list(folio, &source)) {
 			if (__ratelimit(&migrate_rs)) {
-				pr_warn("failed to isolate pfn %lx\n", pfn);
+				pr_warn("failed to isolate pfn %lx\n",
+					page_to_pfn(page));
 				dump_page(page, "isolation failed");
 			}
 		}
-
-		if (!hugetlb)
-			folio_put(folio);
+put_folio:
+		folio_put(folio);
 	}
 	if (!list_empty(&source)) {
 		nodemask_t nmask = node_states[N_MEMORY];
-- 
2.27.0



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v3 5-fix/5] mm: memory_hotplug: unify Huge/LRU/non-LRU movable folio isolation fix
  2024-08-29 15:05   ` [PATCH v3 5-fix/5] mm: memory_hotplug: unify Huge/LRU/non-LRU movable folio isolation fix Kefeng Wang
@ 2024-08-29 15:19     ` David Hildenbrand
  2024-08-30  1:23       ` Kefeng Wang
  0 siblings, 1 reply; 20+ messages in thread
From: David Hildenbrand @ 2024-08-29 15:19 UTC (permalink / raw)
  To: Kefeng Wang, Andrew Morton
  Cc: Oscar Salvador, Miaohe Lin, Naoya Horiguchi, linux-mm

On 29.08.24 17:05, Kefeng Wang wrote:
>    - Use folio_try_get()/folio_put() unconditionally, per David
>    - print current pfn when isolate_folio_to_list fails.
>    - memory hotremove test passed for free/used hugetlb folio
> 
> Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
> ---
>   mm/memory_hotplug.c | 20 +++++++++-----------
>   1 file changed, 9 insertions(+), 11 deletions(-)
> 
> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> index 5f09866d17cf..621ae1015106 100644
> --- a/mm/memory_hotplug.c
> +++ b/mm/memory_hotplug.c
> @@ -1780,7 +1780,6 @@ static void do_migrate_range(unsigned long start_pfn, unsigned long end_pfn)
>   
>   	for (pfn = start_pfn; pfn < end_pfn; pfn++) {
>   		struct page *page;
> -		bool hugetlb;
>   
>   		if (!pfn_valid(pfn))
>   			continue;
> @@ -1811,22 +1810,21 @@ static void do_migrate_range(unsigned long start_pfn, unsigned long end_pfn)
>   			continue;
>   		}
>   
> -		hugetlb = folio_test_hugetlb(folio);
> -		if (!hugetlb) {
> -			folio = folio_get_nontail_page(page);
> -			if (!folio)
> -				continue;
> -		}
> +		if (!folio_try_get(folio))
> +			continue;
> +
> +		if (unlikely(page_folio(page) != folio))
> +			goto put_folio;
>   
>   		if (!isolate_folio_to_list(folio, &source)) {
>   			if (__ratelimit(&migrate_rs)) {
> -				pr_warn("failed to isolate pfn %lx\n", pfn);
> +				pr_warn("failed to isolate pfn %lx\n",
> +					page_to_pfn(page));
>   				dump_page(page, "isolation failed");
>   			}
>   		}
> -
> -		if (!hugetlb)
> -			folio_put(folio);
> +put_folio:
> +		folio_put(folio);
>   	}
>   	if (!list_empty(&source)) {
>   		nodemask_t nmask = node_states[N_MEMORY];

LGTM, isolate_hugetlb() would not handle free hugetlb folios either, 
because of the folio_try_get().

-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v3 5-fix/5] mm: memory_hotplug: unify Huge/LRU/non-LRU movable folio isolation fix
  2024-08-29 15:19     ` David Hildenbrand
@ 2024-08-30  1:23       ` Kefeng Wang
  0 siblings, 0 replies; 20+ messages in thread
From: Kefeng Wang @ 2024-08-30  1:23 UTC (permalink / raw)
  To: David Hildenbrand, Andrew Morton
  Cc: Oscar Salvador, Miaohe Lin, Naoya Horiguchi, linux-mm



On 2024/8/29 23:19, David Hildenbrand wrote:
> On 29.08.24 17:05, Kefeng Wang wrote:
>>    - Use folio_try_get()/folio_put() unconditionally, per David
>>    - print current pfn when isolate_folio_to_list fails.
>>    - memory hotremove test passed for free/used hugetlb folio
>>
>> Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
>> ---
>>   mm/memory_hotplug.c | 20 +++++++++-----------
>>   1 file changed, 9 insertions(+), 11 deletions(-)
>>
>> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
>> index 5f09866d17cf..621ae1015106 100644
>> --- a/mm/memory_hotplug.c
>> +++ b/mm/memory_hotplug.c
>> @@ -1780,7 +1780,6 @@ static void do_migrate_range(unsigned long 
>> start_pfn, unsigned long end_pfn)
>>       for (pfn = start_pfn; pfn < end_pfn; pfn++) {
>>           struct page *page;
>> -        bool hugetlb;
>>           if (!pfn_valid(pfn))
>>               continue;
>> @@ -1811,22 +1810,21 @@ static void do_migrate_range(unsigned long 
>> start_pfn, unsigned long end_pfn)
>>               continue;
>>           }
>> -        hugetlb = folio_test_hugetlb(folio);
>> -        if (!hugetlb) {
>> -            folio = folio_get_nontail_page(page);
>> -            if (!folio)
>> -                continue;
>> -        }
>> +        if (!folio_try_get(folio))
>> +            continue;
>> +
>> +        if (unlikely(page_folio(page) != folio))
>> +            goto put_folio;
>>           if (!isolate_folio_to_list(folio, &source)) {
>>               if (__ratelimit(&migrate_rs)) {
>> -                pr_warn("failed to isolate pfn %lx\n", pfn);
>> +                pr_warn("failed to isolate pfn %lx\n",
>> +                    page_to_pfn(page));
>>                   dump_page(page, "isolation failed");
>>               }
>>           }
>> -
>> -        if (!hugetlb)
>> -            folio_put(folio);
>> +put_folio:
>> +        folio_put(folio);
>>       }
>>       if (!list_empty(&source)) {
>>           nodemask_t nmask = node_states[N_MEMORY];
> 
> LGTM, isolate_hugetlb() would not handle free hugetlb folios either, 
> because of the folio_try_get().

Exactly,thanks for your review :)
> 


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v3 2/5] mm: memory-failure: add unmap_poisoned_folio()
  2024-08-27 11:47 ` [PATCH v3 2/5] mm: memory-failure: add unmap_poisoned_folio() Kefeng Wang
@ 2024-08-31  8:16   ` Miaohe Lin
  0 siblings, 0 replies; 20+ messages in thread
From: Miaohe Lin @ 2024-08-31  8:16 UTC (permalink / raw)
  To: Kefeng Wang, Andrew Morton
  Cc: David Hildenbrand, Oscar Salvador, Naoya Horiguchi, linux-mm,
	dan.carpenter, Jonathan Cameron

On 2024/8/27 19:47, Kefeng Wang wrote:
> Add unmap_poisoned_folio() helper which will be reused by
> do_migrate_range() from memory hotplug soon.
> 
> Acked-by: David Hildenbrand <david@redhat.com>
> Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>

LGTM. One nit below.

> ---
>  mm/internal.h       |  8 ++++++++
>  mm/memory-failure.c | 43 ++++++++++++++++++++++++++-----------------
>  2 files changed, 34 insertions(+), 17 deletions(-)
> 
> diff --git a/mm/internal.h b/mm/internal.h
> index 5e6f2abcea28..b00ea4595d18 100644
> --- a/mm/internal.h
> +++ b/mm/internal.h
> @@ -1048,6 +1048,8 @@ static inline int find_next_best_node(int node, nodemask_t *used_node_mask)
>  /*
>   * mm/memory-failure.c
>   */
> +#ifdef CONFIG_MEMORY_FAILURE
> +void unmap_poisoned_folio(struct folio *folio, enum ttu_flags ttu);
>  void shake_folio(struct folio *folio);
>  extern int hwpoison_filter(struct page *p);
>  
> @@ -1068,6 +1070,12 @@ void add_to_kill_ksm(struct task_struct *tsk, struct page *p,
>  		     unsigned long ksm_addr);
>  unsigned long page_mapped_in_vma(struct page *page, struct vm_area_struct *vma);
>  
> +#else
> +static inline void unmap_poisoned_folio(struct folio *folio, enum ttu_flags ttu)
> +{
> +}
> +#endif
> +
>  extern unsigned long  __must_check vm_mmap_pgoff(struct file *, unsigned long,
>          unsigned long, unsigned long,
>          unsigned long, unsigned long);
> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
> index 353254537b54..67b6b259a75d 100644
> --- a/mm/memory-failure.c
> +++ b/mm/memory-failure.c
> @@ -1554,6 +1554,31 @@ static int get_hwpoison_page(struct page *p, unsigned long flags)
>  	return ret;
>  }
>  
> +void unmap_poisoned_folio(struct folio *folio, enum ttu_flags ttu)
> +{
> +	if (folio_test_hugetlb(folio) && !folio_test_anon(folio)) {
> +		struct address_space *mapping;

It might be better to have a empty line here.

> +		/*
> +		 * For hugetlb folios in shared mappings, try_to_unmap
> +		 * could potentially call huge_pmd_unshare.  Because of
> +		 * this, take semaphore in write mode here and set
> +		 * TTU_RMAP_LOCKED to indicate we have taken the lock
> +		 * at this higher level.
> +		 */
> +		mapping = hugetlb_folio_mapping_lock_write(folio);
> +		if (!mapping) {
> +			pr_info("%#lx: could not lock mapping for mapped hugetlb folio\n",
> +				folio_pfn(folio));
> +			return;
> +		}
> +

Acked-by: Miaohe Lin <linmiaohe@huawei.com>

Thanks.
.





^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v3 3/5] mm: memory_hotplug: check hwpoisoned page firstly in do_migrate_range()
  2024-08-27 11:47 ` [PATCH v3 3/5] mm: memory_hotplug: check hwpoisoned page firstly in do_migrate_range() Kefeng Wang
@ 2024-08-31  8:36   ` Miaohe Lin
  0 siblings, 0 replies; 20+ messages in thread
From: Miaohe Lin @ 2024-08-31  8:36 UTC (permalink / raw)
  To: Kefeng Wang, Andrew Morton
  Cc: David Hildenbrand, Oscar Salvador, Naoya Horiguchi, linux-mm,
	dan.carpenter, Jonathan Cameron

On 2024/8/27 19:47, Kefeng Wang wrote:
> The commit b15c87263a69 ("hwpoison, memory_hotplug: allow hwpoisoned
> pages to be offlined") don't handle the hugetlb pages, the endless loop
> still occur if offline a hwpoison hugetlb page, luckly, after the
> commit e591ef7d96d6 ("mm, hwpoison,hugetlb,memory_hotplug: hotremove
> memory section with hwpoisoned hugepage"), the HPageMigratable of hugetlb
> page will be cleared, and the hwpoison hugetlb page will be skipped in
> scan_movable_pages(), so the endless loop issue is fixed.
> 
> However if the HPageMigratable() check passed(without reference and lock),
> the hugetlb page may be hwpoisoned, it won't cause issue since the
> hwpoisoned page will be handled correctly in the next movable pages scan
> loop, and it will be isolated in do_migrate_range() but fails to migrate.
> In order to avoid the unnecessary isolation and unify all hwpoisoned page
> handling, let's unconditionally check hwpoison firstly, and if it is a
> hwpoisoned hugetlb page, try to unmap it as the catch all safety net like
> normal page does.
> 
> Acked-by: David Hildenbrand <david@redhat.com>
> Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
> ---
>  mm/memory_hotplug.c | 20 ++++++++++----------
>  1 file changed, 10 insertions(+), 10 deletions(-)
> 
> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> index 9ef776a25b9d..1335fb6ef7fa 100644
> --- a/mm/memory_hotplug.c
> +++ b/mm/memory_hotplug.c
> @@ -1793,26 +1793,26 @@ static void do_migrate_range(unsigned long start_pfn, unsigned long end_pfn)
>  		 * folio_nr_pages() may read garbage.  This is fine as the outer
>  		 * loop will revisit the split folio later.
>  		 */
> -		if (folio_test_large(folio)) {
> +		if (folio_test_large(folio))
>  			pfn = folio_pfn(folio) + folio_nr_pages(folio) - 1;
> -			if (folio_test_hugetlb(folio)) {
> -				isolate_hugetlb(folio, &source);
> -				continue;
> -			}
> -		}
>  
>  		/*
>  		 * HWPoison pages have elevated reference counts so the migration would
>  		 * fail on them. It also doesn't make any sense to migrate them in the
>  		 * first place. Still try to unmap such a page in case it is still mapped
> -		 * (e.g. current hwpoison implementation doesn't unmap KSM pages but keep
> -		 * the unmap as the catch all safety net).
> +		 * (keep the unmap as the catch all safety net).

I tend to remove "keep the unmap as the catch all safety net" too. I think it's simply
used to describe KSM scene.

Reviewed-by: Miaohe Lin <linmiaohe@huawei.com>
Thanks.
.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v3 5/5] mm: memory_hotplug: unify Huge/LRU/non-LRU movable folio isolation
  2024-08-27 11:47 ` [PATCH v3 5/5] mm: memory_hotplug: unify Huge/LRU/non-LRU movable folio isolation Kefeng Wang
  2024-08-29 15:05   ` [PATCH v3 5-fix/5] mm: memory_hotplug: unify Huge/LRU/non-LRU movable folio isolation fix Kefeng Wang
@ 2024-08-31  9:01   ` Miaohe Lin
  1 sibling, 0 replies; 20+ messages in thread
From: Miaohe Lin @ 2024-08-31  9:01 UTC (permalink / raw)
  To: Kefeng Wang, Andrew Morton
  Cc: David Hildenbrand, Oscar Salvador, Naoya Horiguchi, linux-mm,
	dan.carpenter, Jonathan Cameron

On 2024/8/27 19:47, Kefeng Wang wrote:
> Use the isolate_folio_to_list() to unify hugetlb/LRU/non-LRU folio
> isolation, which cleanup code a bit and save a few calls to
> compound_head().
> 
> Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
> ---
>  mm/memory_hotplug.c | 45 +++++++++++++++++----------------------------
>  1 file changed, 17 insertions(+), 28 deletions(-)
> 
> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> index 1335fb6ef7fa..5f09866d17cf 100644
> --- a/mm/memory_hotplug.c
> +++ b/mm/memory_hotplug.c
> @@ -1772,15 +1772,15 @@ static int scan_movable_pages(unsigned long start, unsigned long end,
>  
>  static void do_migrate_range(unsigned long start_pfn, unsigned long end_pfn)
>  {
> +	struct folio *folio;
>  	unsigned long pfn;
> -	struct page *page;
>  	LIST_HEAD(source);
>  	static DEFINE_RATELIMIT_STATE(migrate_rs, DEFAULT_RATELIMIT_INTERVAL,
>  				      DEFAULT_RATELIMIT_BURST);
>  
>  	for (pfn = start_pfn; pfn < end_pfn; pfn++) {
> -		struct folio *folio;
> -		bool isolated;
> +		struct page *page;
> +		bool hugetlb;
>  
>  		if (!pfn_valid(pfn))
>  			continue;
> @@ -1811,34 +1811,22 @@ static void do_migrate_range(unsigned long start_pfn, unsigned long end_pfn)
>  			continue;
>  		}
>  
> -		if (folio_test_hugetlb(folio)) {
> -			isolate_hugetlb(folio, &source);
> -			continue;
> +		hugetlb = folio_test_hugetlb(folio);
> +		if (!hugetlb) {
> +			folio = folio_get_nontail_page(page);
> +			if (!folio)
> +				continue;
>  		}
>  
> -		if (!get_page_unless_zero(page))
> -			continue;
> -		/*
> -		 * We can skip free pages. And we can deal with pages on
> -		 * LRU and non-lru movable pages.
> -		 */
> -		if (PageLRU(page))
> -			isolated = isolate_lru_page(page);
> -		else
> -			isolated = isolate_movable_page(page, ISOLATE_UNEVICTABLE);
> -		if (isolated) {
> -			list_add_tail(&page->lru, &source);

This has a side effect that list_add_tail is replaced with list_add now. But it seems this
won't cause any problem.

Reviewed-by: Miaohe Lin <linmiaohe@huawei.com>
Thanks.
.



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v3 1/5] mm: memory_hotplug: remove head variable in do_migrate_range()
  2024-08-27 11:47 ` [PATCH v3 1/5] mm: memory_hotplug: remove head variable in do_migrate_range() Kefeng Wang
@ 2024-09-28  4:55   ` Matthew Wilcox
  2024-09-28  8:34     ` David Hildenbrand
  0 siblings, 1 reply; 20+ messages in thread
From: Matthew Wilcox @ 2024-09-28  4:55 UTC (permalink / raw)
  To: Kefeng Wang
  Cc: Andrew Morton, David Hildenbrand, Oscar Salvador, Miaohe Lin,
	Naoya Horiguchi, linux-mm, dan.carpenter, Jonathan Cameron

On Tue, Aug 27, 2024 at 07:47:24PM +0800, Kefeng Wang wrote:
> Directly use a folio for HugeTLB and THP when calculate the next pfn, then
> remove unused head variable.

I just noticed this got merged.  You're going to hit BUG_ON with it.

> -		if (PageHuge(page)) {
> -			pfn = page_to_pfn(head) + compound_nr(head) - 1;
> -			isolate_hugetlb(folio, &source);
> -			continue;
> -		} else if (PageTransHuge(page))
> -			pfn = page_to_pfn(head) + thp_nr_pages(page) - 1;
> +		/*
> +		 * No reference or lock is held on the folio, so it might
> +		 * be modified concurrently (e.g. split).  As such,
> +		 * folio_nr_pages() may read garbage.  This is fine as the outer
> +		 * loop will revisit the split folio later.
> +		 */
> +		if (folio_test_large(folio)) {

But it's not fine.  Look at the implementation of folio_test_large():

static inline bool folio_test_large(const struct folio *folio)
{
        return folio_test_head(folio);
}

That's going to be provided by:

#define FOLIO_TEST_FLAG(name, page)                                     \
static __always_inline bool folio_test_##name(const struct folio *folio) \
{ return test_bit(PG_##name, const_folio_flags(folio, page)); }

and here's the BUG:

static const unsigned long *const_folio_flags(const struct folio *folio,
                unsigned n)
{
        const struct page *page = &folio->page;

        VM_BUG_ON_PGFLAGS(PageTail(page), page);
        VM_BUG_ON_PGFLAGS(n > 0 && !test_bit(PG_head, &page->flags), page);
        return &page[n].flags;
}

(this page can be transformed from a head page to a tail page because,
as the comment notes, we don't hold a reference.

Please back this out.



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v3 1/5] mm: memory_hotplug: remove head variable in do_migrate_range()
  2024-09-28  4:55   ` Matthew Wilcox
@ 2024-09-28  8:34     ` David Hildenbrand
  2024-09-28  8:39       ` David Hildenbrand
  0 siblings, 1 reply; 20+ messages in thread
From: David Hildenbrand @ 2024-09-28  8:34 UTC (permalink / raw)
  To: Matthew Wilcox, Kefeng Wang
  Cc: Andrew Morton, Oscar Salvador, Miaohe Lin, Naoya Horiguchi,
	linux-mm, dan.carpenter, Jonathan Cameron

On 28.09.24 06:55, Matthew Wilcox wrote:
> On Tue, Aug 27, 2024 at 07:47:24PM +0800, Kefeng Wang wrote:
>> Directly use a folio for HugeTLB and THP when calculate the next pfn, then
>> remove unused head variable.
> 
> I just noticed this got merged.  You're going to hit BUG_ON with it.
> 
>> -		if (PageHuge(page)) {
>> -			pfn = page_to_pfn(head) + compound_nr(head) - 1;
>> -			isolate_hugetlb(folio, &source);
>> -			continue;
>> -		} else if (PageTransHuge(page))
>> -			pfn = page_to_pfn(head) + thp_nr_pages(page) - 1;
>> +		/*
>> +		 * No reference or lock is held on the folio, so it might
>> +		 * be modified concurrently (e.g. split).  As such,
>> +		 * folio_nr_pages() may read garbage.  This is fine as the outer
>> +		 * loop will revisit the split folio later.
>> +		 */
>> +		if (folio_test_large(folio)) {
> 
> But it's not fine.  Look at the implementation of folio_test_large():
> 
> static inline bool folio_test_large(const struct folio *folio)
> {
>          return folio_test_head(folio);
> }
> 
> That's going to be provided by:
> 
> #define FOLIO_TEST_FLAG(name, page)                                     \
> static __always_inline bool folio_test_##name(const struct folio *folio) \
> { return test_bit(PG_##name, const_folio_flags(folio, page)); }
> 
> and here's the BUG:
> 
> static const unsigned long *const_folio_flags(const struct folio *folio,
>                  unsigned n)
> {
>          const struct page *page = &folio->page;
> 
>          VM_BUG_ON_PGFLAGS(PageTail(page), page);
>          VM_BUG_ON_PGFLAGS(n > 0 && !test_bit(PG_head, &page->flags), page);
>          return &page[n].flags;
> }
> 
> (this page can be transformed from a head page to a tail page because,
> as the comment notes, we don't hold a reference.
> 
> Please back this out.

Should we generalize the approach in dump_folio() to locally copy a 
folio, so we can safely perform checks before deciding whether we want 
to try grabbing a reference on the real folio (if it's still a folio :) )?

-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v3 1/5] mm: memory_hotplug: remove head variable in do_migrate_range()
  2024-09-28  8:34     ` David Hildenbrand
@ 2024-09-28  8:39       ` David Hildenbrand
  2024-09-29  1:16         ` Miaohe Lin
  0 siblings, 1 reply; 20+ messages in thread
From: David Hildenbrand @ 2024-09-28  8:39 UTC (permalink / raw)
  To: Matthew Wilcox, Kefeng Wang
  Cc: Andrew Morton, Oscar Salvador, Miaohe Lin, Naoya Horiguchi,
	linux-mm, dan.carpenter, Jonathan Cameron

On 28.09.24 10:34, David Hildenbrand wrote:
> On 28.09.24 06:55, Matthew Wilcox wrote:
>> On Tue, Aug 27, 2024 at 07:47:24PM +0800, Kefeng Wang wrote:
>>> Directly use a folio for HugeTLB and THP when calculate the next pfn, then
>>> remove unused head variable.
>>
>> I just noticed this got merged.  You're going to hit BUG_ON with it.
>>
>>> -		if (PageHuge(page)) {
>>> -			pfn = page_to_pfn(head) + compound_nr(head) - 1;
>>> -			isolate_hugetlb(folio, &source);
>>> -			continue;
>>> -		} else if (PageTransHuge(page))
>>> -			pfn = page_to_pfn(head) + thp_nr_pages(page) - 1;
>>> +		/*
>>> +		 * No reference or lock is held on the folio, so it might
>>> +		 * be modified concurrently (e.g. split).  As such,
>>> +		 * folio_nr_pages() may read garbage.  This is fine as the outer
>>> +		 * loop will revisit the split folio later.
>>> +		 */
>>> +		if (folio_test_large(folio)) {
>>
>> But it's not fine.  Look at the implementation of folio_test_large():
>>
>> static inline bool folio_test_large(const struct folio *folio)
>> {
>>           return folio_test_head(folio);
>> }
>>
>> That's going to be provided by:
>>
>> #define FOLIO_TEST_FLAG(name, page)                                     \
>> static __always_inline bool folio_test_##name(const struct folio *folio) \
>> { return test_bit(PG_##name, const_folio_flags(folio, page)); }
>>
>> and here's the BUG:
>>
>> static const unsigned long *const_folio_flags(const struct folio *folio,
>>                   unsigned n)
>> {
>>           const struct page *page = &folio->page;
>>
>>           VM_BUG_ON_PGFLAGS(PageTail(page), page);
>>           VM_BUG_ON_PGFLAGS(n > 0 && !test_bit(PG_head, &page->flags), page);
>>           return &page[n].flags;
>> }
>>
>> (this page can be transformed from a head page to a tail page because,
>> as the comment notes, we don't hold a reference.
>>
>> Please back this out.
> 
> Should we generalize the approach in dump_folio() to locally copy a
> folio, so we can safely perform checks before deciding whether we want
> to try grabbing a reference on the real folio (if it's still a folio :) )?
> 

Oh, and I forgot: isn't the existing code already racy?

PageTransHuge() -> VM_BUG_ON_PAGE(PageTail(page), page);

-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v3 1/5] mm: memory_hotplug: remove head variable in do_migrate_range()
  2024-09-28  8:39       ` David Hildenbrand
@ 2024-09-29  1:16         ` Miaohe Lin
  2024-09-29  2:04           ` Kefeng Wang
  0 siblings, 1 reply; 20+ messages in thread
From: Miaohe Lin @ 2024-09-29  1:16 UTC (permalink / raw)
  To: David Hildenbrand, Matthew Wilcox, Kefeng Wang
  Cc: Andrew Morton, Oscar Salvador, Naoya Horiguchi, linux-mm,
	dan.carpenter, Jonathan Cameron

On 2024/9/28 16:39, David Hildenbrand wrote:
> On 28.09.24 10:34, David Hildenbrand wrote:
>> On 28.09.24 06:55, Matthew Wilcox wrote:
>>> On Tue, Aug 27, 2024 at 07:47:24PM +0800, Kefeng Wang wrote:
>>>> Directly use a folio for HugeTLB and THP when calculate the next pfn, then
>>>> remove unused head variable.
>>>
>>> I just noticed this got merged.  You're going to hit BUG_ON with it.
>>>
>>>> -        if (PageHuge(page)) {
>>>> -            pfn = page_to_pfn(head) + compound_nr(head) - 1;
>>>> -            isolate_hugetlb(folio, &source);
>>>> -            continue;
>>>> -        } else if (PageTransHuge(page))
>>>> -            pfn = page_to_pfn(head) + thp_nr_pages(page) - 1;
>>>> +        /*
>>>> +         * No reference or lock is held on the folio, so it might
>>>> +         * be modified concurrently (e.g. split).  As such,
>>>> +         * folio_nr_pages() may read garbage.  This is fine as the outer
>>>> +         * loop will revisit the split folio later.
>>>> +         */
>>>> +        if (folio_test_large(folio)) {
>>>
>>> But it's not fine.  Look at the implementation of folio_test_large():
>>>
>>> static inline bool folio_test_large(const struct folio *folio)
>>> {
>>>           return folio_test_head(folio);
>>> }
>>>
>>> That's going to be provided by:
>>>
>>> #define FOLIO_TEST_FLAG(name, page)                                     \
>>> static __always_inline bool folio_test_##name(const struct folio *folio) \
>>> { return test_bit(PG_##name, const_folio_flags(folio, page)); }
>>>
>>> and here's the BUG:
>>>
>>> static const unsigned long *const_folio_flags(const struct folio *folio,
>>>                   unsigned n)
>>> {
>>>           const struct page *page = &folio->page;
>>>
>>>           VM_BUG_ON_PGFLAGS(PageTail(page), page);
>>>           VM_BUG_ON_PGFLAGS(n > 0 && !test_bit(PG_head, &page->flags), page);
>>>           return &page[n].flags;
>>> }
>>>
>>> (this page can be transformed from a head page to a tail page because,
>>> as the comment notes, we don't hold a reference.
>>>
>>> Please back this out.
>>
>> Should we generalize the approach in dump_folio() to locally copy a
>> folio, so we can safely perform checks before deciding whether we want
>> to try grabbing a reference on the real folio (if it's still a folio :) )?
>>
> 
> Oh, and I forgot: isn't the existing code already racy?
> 
> PageTransHuge() -> VM_BUG_ON_PAGE(PageTail(page), page);

do_migrate_range is called after start_isolate_page_range(). So a page might not be able to
transform from a head page to a tail page as it's isolated?

Thanks.



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v3 1/5] mm: memory_hotplug: remove head variable in do_migrate_range()
  2024-09-29  1:16         ` Miaohe Lin
@ 2024-09-29  2:04           ` Kefeng Wang
  2024-09-29  2:19             ` Miaohe Lin
  0 siblings, 1 reply; 20+ messages in thread
From: Kefeng Wang @ 2024-09-29  2:04 UTC (permalink / raw)
  To: Miaohe Lin, David Hildenbrand, Matthew Wilcox
  Cc: Andrew Morton, Oscar Salvador, Naoya Horiguchi, linux-mm,
	dan.carpenter, Jonathan Cameron



On 2024/9/29 9:16, Miaohe Lin wrote:
> On 2024/9/28 16:39, David Hildenbrand wrote:
>> On 28.09.24 10:34, David Hildenbrand wrote:
>>> On 28.09.24 06:55, Matthew Wilcox wrote:
>>>> On Tue, Aug 27, 2024 at 07:47:24PM +0800, Kefeng Wang wrote:
>>>>> Directly use a folio for HugeTLB and THP when calculate the next pfn, then
>>>>> remove unused head variable.
>>>>
>>>> I just noticed this got merged.  You're going to hit BUG_ON with it.
>>>>
>>>>> -        if (PageHuge(page)) {
>>>>> -            pfn = page_to_pfn(head) + compound_nr(head) - 1;
>>>>> -            isolate_hugetlb(folio, &source);
>>>>> -            continue;
>>>>> -        } else if (PageTransHuge(page))
>>>>> -            pfn = page_to_pfn(head) + thp_nr_pages(page) - 1;
>>>>> +        /*
>>>>> +         * No reference or lock is held on the folio, so it might
>>>>> +         * be modified concurrently (e.g. split).  As such,
>>>>> +         * folio_nr_pages() may read garbage.  This is fine as the outer
>>>>> +         * loop will revisit the split folio later.
>>>>> +         */
>>>>> +        if (folio_test_large(folio)) {
>>>>
>>>> But it's not fine.  Look at the implementation of folio_test_large():
>>>>
>>>> static inline bool folio_test_large(const struct folio *folio)
>>>> {
>>>>            return folio_test_head(folio);
>>>> }
>>>>
>>>> That's going to be provided by:
>>>>
>>>> #define FOLIO_TEST_FLAG(name, page)                                     \
>>>> static __always_inline bool folio_test_##name(const struct folio *folio) \
>>>> { return test_bit(PG_##name, const_folio_flags(folio, page)); }
>>>>
>>>> and here's the BUG:
>>>>
>>>> static const unsigned long *const_folio_flags(const struct folio *folio,
>>>>                    unsigned n)
>>>> {
>>>>            const struct page *page = &folio->page;
>>>>
>>>>            VM_BUG_ON_PGFLAGS(PageTail(page), page);
>>>>            VM_BUG_ON_PGFLAGS(n > 0 && !test_bit(PG_head, &page->flags), page);
>>>>            return &page[n].flags;
>>>> }
>>>>
>>>> (this page can be transformed from a head page to a tail page because,
>>>> as the comment notes, we don't hold a reference.
>>>>
>>>> Please back this out.
>>>
>>> Should we generalize the approach in dump_folio() to locally copy a
>>> folio, so we can safely perform checks before deciding whether we want
>>> to try grabbing a reference on the real folio (if it's still a folio :) )?
>>>
>>
>> Oh, and I forgot: isn't the existing code already racy?
>>
>> PageTransHuge() -> VM_BUG_ON_PAGE(PageTail(page), page);

Yes, in v1[1], I asked same question for existing code for 
PageTransHuge(page),

   "If the page is a tail page, we will BUG_ON(DEBUG_VM enabled) here,
    but it seems that we don't guarantee the page won't be a tail page."


we could delay the calculation after we got a ref, but the traversal of 
pfn may slow down a little if hint a tail pfn, is it acceptable?

--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1786,15 +1786,6 @@ static void do_migrate_range(unsigned long 
start_pfn, unsigned long end_pfn)
                 page = pfn_to_page(pfn);
                 folio = page_folio(page);

-               /*
-                * No reference or lock is held on the folio, so it might
-                * be modified concurrently (e.g. split).  As such,
-                * folio_nr_pages() may read garbage.  This is fine as 
the outer
-                * loop will revisit the split folio later.
-                */
-               if (folio_test_large(folio))
-                       pfn = folio_pfn(folio) + folio_nr_pages(folio) - 1;
-
                 /*
                  * HWPoison pages have elevated reference counts so the 
migration would
                  * fail on them. It also doesn't make any sense to 
migrate them in the
@@ -1807,6 +1798,8 @@ static void do_migrate_range(unsigned long 
start_pfn, unsigned long end_pfn)
                                 folio_isolate_lru(folio);
                         if (folio_mapped(folio))
                                 unmap_poisoned_folio(folio, 
TTU_IGNORE_MLOCK);
+                       if (folio_test_large(folio))
+                               pfn = folio_pfn(folio) + 
folio_nr_pages(folio) - 1;
                         continue;
                 }

@@ -1823,6 +1816,9 @@ static void do_migrate_range(unsigned long 
start_pfn, unsigned long end_pfn)
                                 dump_page(page, "isolation failed");
                         }
                 }
+
+               if (folio_test_large(folio))
+                       pfn = folio_pfn(folio) + folio_nr_pages(folio) - 1;
  put_folio:
                 folio_put(folio);
         }


> 
> do_migrate_range is called after start_isolate_page_range(). So a page might not be able to
> transform from a head page to a tail page as it's isolated?
start_isolate_page_range() is only isolate free pages, so maybe irrelevant.

> 
> Thanks.
> 

[1] 
https://lore.kernel.org/linux-mm/4e693aa6-d742-4fe7-bd97-3d375f96fcfa@huawei.com/



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v3 1/5] mm: memory_hotplug: remove head variable in do_migrate_range()
  2024-09-29  2:04           ` Kefeng Wang
@ 2024-09-29  2:19             ` Miaohe Lin
  2024-09-30  9:25               ` David Hildenbrand
  0 siblings, 1 reply; 20+ messages in thread
From: Miaohe Lin @ 2024-09-29  2:19 UTC (permalink / raw)
  To: Kefeng Wang, David Hildenbrand, Matthew Wilcox
  Cc: Andrew Morton, Oscar Salvador, Naoya Horiguchi, linux-mm,
	dan.carpenter, Jonathan Cameron

On 2024/9/29 10:04, Kefeng Wang wrote:
> 
> 
> On 2024/9/29 9:16, Miaohe Lin wrote:
>> On 2024/9/28 16:39, David Hildenbrand wrote:
>>> On 28.09.24 10:34, David Hildenbrand wrote:
>>>> On 28.09.24 06:55, Matthew Wilcox wrote:
>>>>> On Tue, Aug 27, 2024 at 07:47:24PM +0800, Kefeng Wang wrote:
>>>>>> Directly use a folio for HugeTLB and THP when calculate the next pfn, then
>>>>>> remove unused head variable.
>>>>>
>>>>> I just noticed this got merged.  You're going to hit BUG_ON with it.
>>>>>
>>>>>> -        if (PageHuge(page)) {
>>>>>> -            pfn = page_to_pfn(head) + compound_nr(head) - 1;
>>>>>> -            isolate_hugetlb(folio, &source);
>>>>>> -            continue;
>>>>>> -        } else if (PageTransHuge(page))
>>>>>> -            pfn = page_to_pfn(head) + thp_nr_pages(page) - 1;
>>>>>> +        /*
>>>>>> +         * No reference or lock is held on the folio, so it might
>>>>>> +         * be modified concurrently (e.g. split).  As such,
>>>>>> +         * folio_nr_pages() may read garbage.  This is fine as the outer
>>>>>> +         * loop will revisit the split folio later.
>>>>>> +         */
>>>>>> +        if (folio_test_large(folio)) {
>>>>>
>>>>> But it's not fine.  Look at the implementation of folio_test_large():
>>>>>
>>>>> static inline bool folio_test_large(const struct folio *folio)
>>>>> {
>>>>>            return folio_test_head(folio);
>>>>> }
>>>>>
>>>>> That's going to be provided by:
>>>>>
>>>>> #define FOLIO_TEST_FLAG(name, page)                                     \
>>>>> static __always_inline bool folio_test_##name(const struct folio *folio) \
>>>>> { return test_bit(PG_##name, const_folio_flags(folio, page)); }
>>>>>
>>>>> and here's the BUG:
>>>>>
>>>>> static const unsigned long *const_folio_flags(const struct folio *folio,
>>>>>                    unsigned n)
>>>>> {
>>>>>            const struct page *page = &folio->page;
>>>>>
>>>>>            VM_BUG_ON_PGFLAGS(PageTail(page), page);
>>>>>            VM_BUG_ON_PGFLAGS(n > 0 && !test_bit(PG_head, &page->flags), page);
>>>>>            return &page[n].flags;
>>>>> }
>>>>>
>>>>> (this page can be transformed from a head page to a tail page because,
>>>>> as the comment notes, we don't hold a reference.
>>>>>
>>>>> Please back this out.
>>>>
>>>> Should we generalize the approach in dump_folio() to locally copy a
>>>> folio, so we can safely perform checks before deciding whether we want
>>>> to try grabbing a reference on the real folio (if it's still a folio :) )?
>>>>
>>>
>>> Oh, and I forgot: isn't the existing code already racy?
>>>
>>> PageTransHuge() -> VM_BUG_ON_PAGE(PageTail(page), page);
> 
> Yes, in v1[1], I asked same question for existing code for PageTransHuge(page),
> 
>   "If the page is a tail page, we will BUG_ON(DEBUG_VM enabled) here,
>    but it seems that we don't guarantee the page won't be a tail page."
> 
> 
> we could delay the calculation after we got a ref, but the traversal of pfn may slow down a little if hint a tail pfn, is it acceptable?
> 
> --- a/mm/memory_hotplug.c
> +++ b/mm/memory_hotplug.c
> @@ -1786,15 +1786,6 @@ static void do_migrate_range(unsigned long start_pfn, unsigned long end_pfn)
>                 page = pfn_to_page(pfn);
>                 folio = page_folio(page);
> 
> -               /*
> -                * No reference or lock is held on the folio, so it might
> -                * be modified concurrently (e.g. split).  As such,
> -                * folio_nr_pages() may read garbage.  This is fine as the outer
> -                * loop will revisit the split folio later.
> -                */
> -               if (folio_test_large(folio))
> -                       pfn = folio_pfn(folio) + folio_nr_pages(folio) - 1;
> -
>                 /*
>                  * HWPoison pages have elevated reference counts so the migration would
>                  * fail on them. It also doesn't make any sense to migrate them in the
> @@ -1807,6 +1798,8 @@ static void do_migrate_range(unsigned long start_pfn, unsigned long end_pfn)
>                                 folio_isolate_lru(folio);
>                         if (folio_mapped(folio))
>                                 unmap_poisoned_folio(folio, TTU_IGNORE_MLOCK);
> +                       if (folio_test_large(folio))
> +                               pfn = folio_pfn(folio) + folio_nr_pages(folio) - 1;
>                         continue;
>                 }
> 
> @@ -1823,6 +1816,9 @@ static void do_migrate_range(unsigned long start_pfn, unsigned long end_pfn)
>                                 dump_page(page, "isolation failed");
>                         }
>                 }
> +
> +               if (folio_test_large(folio))
> +                       pfn = folio_pfn(folio) + folio_nr_pages(folio) - 1;
>  put_folio:
>                 folio_put(folio);
>         }
> 
> 
>>
>> do_migrate_range is called after start_isolate_page_range(). So a page might not be able to
>> transform from a head page to a tail page as it's isolated?
> start_isolate_page_range() is only isolate free pages, so maybe irrelevant.

A page transform from a head page to a tail page should through the below steps:
1. The compound page is freed into buddy.
2. It's merged into larger order in buddy.
3. It's allocated as a larger order compound page.

Since it is isolated, I think step 2 or 3 cannot happen. Or am I miss something?

Thanks.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v3 1/5] mm: memory_hotplug: remove head variable in do_migrate_range()
  2024-09-29  2:19             ` Miaohe Lin
@ 2024-09-30  9:25               ` David Hildenbrand
  2024-10-09  7:27                 ` Kefeng Wang
  0 siblings, 1 reply; 20+ messages in thread
From: David Hildenbrand @ 2024-09-30  9:25 UTC (permalink / raw)
  To: Miaohe Lin, Kefeng Wang, Matthew Wilcox
  Cc: Andrew Morton, Oscar Salvador, Naoya Horiguchi, linux-mm,
	dan.carpenter, Jonathan Cameron

On 29.09.24 04:19, Miaohe Lin wrote:
> On 2024/9/29 10:04, Kefeng Wang wrote:
>>
>>
>> On 2024/9/29 9:16, Miaohe Lin wrote:
>>> On 2024/9/28 16:39, David Hildenbrand wrote:
>>>> On 28.09.24 10:34, David Hildenbrand wrote:
>>>>> On 28.09.24 06:55, Matthew Wilcox wrote:
>>>>>> On Tue, Aug 27, 2024 at 07:47:24PM +0800, Kefeng Wang wrote:
>>>>>>> Directly use a folio for HugeTLB and THP when calculate the next pfn, then
>>>>>>> remove unused head variable.
>>>>>>
>>>>>> I just noticed this got merged.  You're going to hit BUG_ON with it.
>>>>>>
>>>>>>> -        if (PageHuge(page)) {
>>>>>>> -            pfn = page_to_pfn(head) + compound_nr(head) - 1;
>>>>>>> -            isolate_hugetlb(folio, &source);
>>>>>>> -            continue;
>>>>>>> -        } else if (PageTransHuge(page))
>>>>>>> -            pfn = page_to_pfn(head) + thp_nr_pages(page) - 1;
>>>>>>> +        /*
>>>>>>> +         * No reference or lock is held on the folio, so it might
>>>>>>> +         * be modified concurrently (e.g. split).  As such,
>>>>>>> +         * folio_nr_pages() may read garbage.  This is fine as the outer
>>>>>>> +         * loop will revisit the split folio later.
>>>>>>> +         */
>>>>>>> +        if (folio_test_large(folio)) {
>>>>>>
>>>>>> But it's not fine.  Look at the implementation of folio_test_large():
>>>>>>
>>>>>> static inline bool folio_test_large(const struct folio *folio)
>>>>>> {
>>>>>>             return folio_test_head(folio);
>>>>>> }
>>>>>>
>>>>>> That's going to be provided by:
>>>>>>
>>>>>> #define FOLIO_TEST_FLAG(name, page)                                     \
>>>>>> static __always_inline bool folio_test_##name(const struct folio *folio) \
>>>>>> { return test_bit(PG_##name, const_folio_flags(folio, page)); }
>>>>>>
>>>>>> and here's the BUG:
>>>>>>
>>>>>> static const unsigned long *const_folio_flags(const struct folio *folio,
>>>>>>                     unsigned n)
>>>>>> {
>>>>>>             const struct page *page = &folio->page;
>>>>>>
>>>>>>             VM_BUG_ON_PGFLAGS(PageTail(page), page);
>>>>>>             VM_BUG_ON_PGFLAGS(n > 0 && !test_bit(PG_head, &page->flags), page);
>>>>>>             return &page[n].flags;
>>>>>> }
>>>>>>
>>>>>> (this page can be transformed from a head page to a tail page because,
>>>>>> as the comment notes, we don't hold a reference.
>>>>>>
>>>>>> Please back this out.
>>>>>
>>>>> Should we generalize the approach in dump_folio() to locally copy a
>>>>> folio, so we can safely perform checks before deciding whether we want
>>>>> to try grabbing a reference on the real folio (if it's still a folio :) )?
>>>>>
>>>>
>>>> Oh, and I forgot: isn't the existing code already racy?
>>>>
>>>> PageTransHuge() -> VM_BUG_ON_PAGE(PageTail(page), page);
>>
>> Yes, in v1[1], I asked same question for existing code for PageTransHuge(page),
>>
>>    "If the page is a tail page, we will BUG_ON(DEBUG_VM enabled) here,
>>     but it seems that we don't guarantee the page won't be a tail page."
>>
>>
>> we could delay the calculation after we got a ref, but the traversal of pfn may slow down a little if hint a tail pfn, is it acceptable?
>>
>> --- a/mm/memory_hotplug.c
>> +++ b/mm/memory_hotplug.c
>> @@ -1786,15 +1786,6 @@ static void do_migrate_range(unsigned long start_pfn, unsigned long end_pfn)
>>                  page = pfn_to_page(pfn);
>>                  folio = page_folio(page);
>>
>> -               /*
>> -                * No reference or lock is held on the folio, so it might
>> -                * be modified concurrently (e.g. split).  As such,
>> -                * folio_nr_pages() may read garbage.  This is fine as the outer
>> -                * loop will revisit the split folio later.
>> -                */
>> -               if (folio_test_large(folio))
>> -                       pfn = folio_pfn(folio) + folio_nr_pages(folio) - 1;
>> -
>>                  /*
>>                   * HWPoison pages have elevated reference counts so the migration would
>>                   * fail on them. It also doesn't make any sense to migrate them in the
>> @@ -1807,6 +1798,8 @@ static void do_migrate_range(unsigned long start_pfn, unsigned long end_pfn)
>>                                  folio_isolate_lru(folio);
>>                          if (folio_mapped(folio))
>>                                  unmap_poisoned_folio(folio, TTU_IGNORE_MLOCK);
>> +                       if (folio_test_large(folio))
>> +                               pfn = folio_pfn(folio) + folio_nr_pages(folio) - 1;
>>                          continue;
>>                  }
>>
>> @@ -1823,6 +1816,9 @@ static void do_migrate_range(unsigned long start_pfn, unsigned long end_pfn)
>>                                  dump_page(page, "isolation failed");
>>                          }
>>                  }
>> +
>> +               if (folio_test_large(folio))
>> +                       pfn = folio_pfn(folio) + folio_nr_pages(folio) - 1;
>>   put_folio:
>>                  folio_put(folio);
>>          }
>>
>>
>>>
>>> do_migrate_range is called after start_isolate_page_range(). So a page might not be able to
>>> transform from a head page to a tail page as it's isolated?
>> start_isolate_page_range() is only isolate free pages, so maybe irrelevant.
> 
> A page transform from a head page to a tail page should through the below steps:
> 1. The compound page is freed into buddy.
> 2. It's merged into larger order in buddy.
> 3. It's allocated as a larger order compound page.
> 
> Since it is isolated, I think step 2 or 3 cannot happen. Or am I miss something?

By isolated, you mean that the pageblock is isolated, and all free pages 
are in the MIGRATE_ISOLATE buddy list. Nice observation.

Indeed, a tail page could become a head page (concurrent split is 
possible), but a head page should not become a tail for the reason you 
mention.

Even mm/page_reporting.c will skip isolated pageblocks.

I wonder if there are some corner cases, but nothing comes to mind that 
would perform compound allocations from the MIGRATE_ISOLATE list.

-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v3 1/5] mm: memory_hotplug: remove head variable in do_migrate_range()
  2024-09-30  9:25               ` David Hildenbrand
@ 2024-10-09  7:27                 ` Kefeng Wang
  0 siblings, 0 replies; 20+ messages in thread
From: Kefeng Wang @ 2024-10-09  7:27 UTC (permalink / raw)
  To: David Hildenbrand, Miaohe Lin, Matthew Wilcox
  Cc: Andrew Morton, Oscar Salvador, Naoya Horiguchi, linux-mm,
	dan.carpenter, Jonathan Cameron



On 2024/9/30 17:25, David Hildenbrand wrote:
> On 29.09.24 04:19, Miaohe Lin wrote:
>> On 2024/9/29 10:04, Kefeng Wang wrote:
>>>
>>>
>>> On 2024/9/29 9:16, Miaohe Lin wrote:
>>>> On 2024/9/28 16:39, David Hildenbrand wrote:
>>>>> On 28.09.24 10:34, David Hildenbrand wrote:
>>>>>> On 28.09.24 06:55, Matthew Wilcox wrote:
>>>>>>> On Tue, Aug 27, 2024 at 07:47:24PM +0800, Kefeng Wang wrote:
>>>>>>>> Directly use a folio for HugeTLB and THP when calculate the next 
>>>>>>>> pfn, then
>>>>>>>> remove unused head variable.
>>>>>>>
>>>>>>> I just noticed this got merged.  You're going to hit BUG_ON with it.
>>>>>>>
>>>>>>>> -        if (PageHuge(page)) {
>>>>>>>> -            pfn = page_to_pfn(head) + compound_nr(head) - 1;
>>>>>>>> -            isolate_hugetlb(folio, &source);
>>>>>>>> -            continue;
>>>>>>>> -        } else if (PageTransHuge(page))
>>>>>>>> -            pfn = page_to_pfn(head) + thp_nr_pages(page) - 1;
>>>>>>>> +        /*
>>>>>>>> +         * No reference or lock is held on the folio, so it might
>>>>>>>> +         * be modified concurrently (e.g. split).  As such,
>>>>>>>> +         * folio_nr_pages() may read garbage.  This is fine as 
>>>>>>>> the outer
>>>>>>>> +         * loop will revisit the split folio later.
>>>>>>>> +         */
>>>>>>>> +        if (folio_test_large(folio)) {
>>>>>>>
>>>>>>> But it's not fine.  Look at the implementation of 
>>>>>>> folio_test_large():
>>>>>>>
>>>>>>> static inline bool folio_test_large(const struct folio *folio)
>>>>>>> {
>>>>>>>             return folio_test_head(folio);
>>>>>>> }
>>>>>>>
>>>>>>> That's going to be provided by:
>>>>>>>
>>>>>>> #define FOLIO_TEST_FLAG(name, 
>>>>>>> page)                                     \
>>>>>>> static __always_inline bool folio_test_##name(const struct folio 
>>>>>>> *folio) \
>>>>>>> { return test_bit(PG_##name, const_folio_flags(folio, page)); }
>>>>>>>
>>>>>>> and here's the BUG:
>>>>>>>
>>>>>>> static const unsigned long *const_folio_flags(const struct folio 
>>>>>>> *folio,
>>>>>>>                     unsigned n)
>>>>>>> {
>>>>>>>             const struct page *page = &folio->page;
>>>>>>>
>>>>>>>             VM_BUG_ON_PGFLAGS(PageTail(page), page);
>>>>>>>             VM_BUG_ON_PGFLAGS(n > 0 && !test_bit(PG_head, &page- 
>>>>>>> >flags), page);
>>>>>>>             return &page[n].flags;
>>>>>>> }
>>>>>>>
>>>>>>> (this page can be transformed from a head page to a tail page 
>>>>>>> because,
>>>>>>> as the comment notes, we don't hold a reference.
>>>>>>>
>>>>>>> Please back this out.
>>>>>>
>>>>>> Should we generalize the approach in dump_folio() to locally copy a
>>>>>> folio, so we can safely perform checks before deciding whether we 
>>>>>> want
>>>>>> to try grabbing a reference on the real folio (if it's still a 
>>>>>> folio :) )?
>>>>>>
>>>>>
>>>>> Oh, and I forgot: isn't the existing code already racy?
>>>>>
>>>>> PageTransHuge() -> VM_BUG_ON_PAGE(PageTail(page), page);
>>>
>>> Yes, in v1[1], I asked same question for existing code for 
>>> PageTransHuge(page),
>>>
>>>    "If the page is a tail page, we will BUG_ON(DEBUG_VM enabled) here,
>>>     but it seems that we don't guarantee the page won't be a tail page."
>>>
>>>
...
>>>>
>>>> do_migrate_range is called after start_isolate_page_range(). So a 
>>>> page might not be able to
>>>> transform from a head page to a tail page as it's isolated?
>>> start_isolate_page_range() is only isolate free pages, so maybe 
>>> irrelevant.
>>
>> A page transform from a head page to a tail page should through the 
>> below steps:
>> 1. The compound page is freed into buddy.
>> 2. It's merged into larger order in buddy.
>> 3. It's allocated as a larger order compound page.
>>
>> Since it is isolated, I think step 2 or 3 cannot happen. Or am I miss 
>> something?
> 
> By isolated, you mean that the pageblock is isolated, and all free pages 
> are in the MIGRATE_ISOLATE buddy list. Nice observation.
> 
> Indeed, a tail page could become a head page (concurrent split is 
> possible), but a head page should not become a tail for the reason you 
> mention.
> 
> Even mm/page_reporting.c will skip isolated pageblocks.
> 
> I wonder if there are some corner cases, but nothing comes to mind that 
> would perform compound allocations from the MIGRATE_ISOLATE list.
> 

As David/Miaohe explanation, it seems that we can't hint the VM_BUG_ON,
so no more changes, thanks for all the review and comments.





^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2024-10-09  7:27 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-08-27 11:47 [PATCH v3 0/5] mm: memory_hotplug: improve do_migrate_range() Kefeng Wang
2024-08-27 11:47 ` [PATCH v3 1/5] mm: memory_hotplug: remove head variable in do_migrate_range() Kefeng Wang
2024-09-28  4:55   ` Matthew Wilcox
2024-09-28  8:34     ` David Hildenbrand
2024-09-28  8:39       ` David Hildenbrand
2024-09-29  1:16         ` Miaohe Lin
2024-09-29  2:04           ` Kefeng Wang
2024-09-29  2:19             ` Miaohe Lin
2024-09-30  9:25               ` David Hildenbrand
2024-10-09  7:27                 ` Kefeng Wang
2024-08-27 11:47 ` [PATCH v3 2/5] mm: memory-failure: add unmap_poisoned_folio() Kefeng Wang
2024-08-31  8:16   ` Miaohe Lin
2024-08-27 11:47 ` [PATCH v3 3/5] mm: memory_hotplug: check hwpoisoned page firstly in do_migrate_range() Kefeng Wang
2024-08-31  8:36   ` Miaohe Lin
2024-08-27 11:47 ` [PATCH v3 4/5] mm: migrate: add isolate_folio_to_list() Kefeng Wang
2024-08-27 11:47 ` [PATCH v3 5/5] mm: memory_hotplug: unify Huge/LRU/non-LRU movable folio isolation Kefeng Wang
2024-08-29 15:05   ` [PATCH v3 5-fix/5] mm: memory_hotplug: unify Huge/LRU/non-LRU movable folio isolation fix Kefeng Wang
2024-08-29 15:19     ` David Hildenbrand
2024-08-30  1:23       ` Kefeng Wang
2024-08-31  9:01   ` [PATCH v3 5/5] mm: memory_hotplug: unify Huge/LRU/non-LRU movable folio isolation Miaohe Lin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox