* [PATCH v4 0/6] mm: migrate: support poison recover from migrate folio
@ 2024-06-03 9:24 Kefeng Wang
2024-06-03 9:24 ` [PATCH v4 1/6] mm: move memory_failure_queue() into copy_mc_[user]_highpage() Kefeng Wang
` (5 more replies)
0 siblings, 6 replies; 25+ messages in thread
From: Kefeng Wang @ 2024-06-03 9:24 UTC (permalink / raw)
To: akpm, linux-mm
Cc: Tony Luck, Miaohe Lin, nao.horiguchi, Matthew Wilcox,
David Hildenbrand, Muchun Song, Benjamin LaHaise, jglisse,
Jiaqi Yan, Hugh Dickins, Vishal Moola, Alistair Popple, Jane Chu,
Oscar Salvador, Lance Yang, Kefeng Wang
The folio migration is widely used in kernel, memory compaction, memory
hotplug, soft offline page, numa balance, memory demote/promotion, etc,
but once access a poisoned source folio when migrating, the kernel will
panic.
There is a mechanism in the kernel to recover from uncorrectable memory
errors, ARCH_HAS_COPY_MC(eg, Machine Check Safe Memory Copy on x86), which
is already used in NVDIMM or core-mm paths(eg, CoW, khugepaged, coredump,
ksm copy), see copy_mc_to_{user,kernel}, copy_mc_{user_}highpage callers.
This series of patches provide the recovery mechanism from folio copy for
the widely used folio migration. Please note, because folio migration is
no guarantee of success, so we could chose to make folio migration tolerant
of memory failures, adding folio_mc_copy() which is a #MC versions of
folio_copy(), once accessing a poisoned source folio, we could return error
and make the folio migration fail, and this could avoid the similar panic
shown below.
CPU: 1 PID: 88343 Comm: test_softofflin Kdump: loaded Not tainted 6.6.0
pc : copy_page+0x10/0xc0
lr : copy_highpage+0x38/0x50
...
Call trace:
copy_page+0x10/0xc0
folio_copy+0x78/0x90
migrate_folio_extra+0x54/0xa0
move_to_new_folio+0xd8/0x1f0
migrate_folio_move+0xb8/0x300
migrate_pages_batch+0x528/0x788
migrate_pages_sync+0x8c/0x258
migrate_pages+0x440/0x528
soft_offline_in_use_page+0x2ec/0x3c0
soft_offline_page+0x238/0x310
soft_offline_page_store+0x6c/0xc0
dev_attr_store+0x20/0x40
sysfs_kf_write+0x4c/0x68
kernfs_fop_write_iter+0x130/0x1c8
new_sync_write+0xa4/0x138
vfs_write+0x238/0x2d8
ksys_write+0x74/0x110
v4:
- return -EHWPOISON instead of -EFAULT in folio_mc_copy() and omit a ret
variable, per Jane and Lance
- return what folio_mc_copy() returns from callers, per Jane
- move memory_failure_queue() into copy_mc_[user_]highpage() instead of
calling it after each copy_mc_[user]_highpage caller, which avoids
re-using the poisoned page, per Luck, Tony
v3:
- only folio migrate recover support part since the cleanup part
has been merged to mm-unstable
- don't introduce new folio_refs_check_and_freeze(), just move
the check and freeze folio out, also update changelog 'mm:
migrate: split folio_migrate_mapping()'
- reorder patches and rebased on next-20240528
- https://lore.kernel.org/linux-mm/20240528134513.2283548-1-wangkefeng.wang@huawei.com/
Kefeng Wang (6):
mm: move memory_failure_queue() into copy_mc_[user]_highpage()
mm: add folio_mc_copy()
mm: migrate: split folio_migrate_mapping()
mm: migrate: support poisoned recover from migrate folio
fs: hugetlbfs: support poison recover from hugetlbfs_migrate_folio()
mm: migrate: remove folio_migrate_copy()
fs/aio.c | 3 +-
fs/hugetlbfs/inode.c | 2 +-
include/linux/highmem.h | 6 +++
include/linux/migrate.h | 1 -
include/linux/mm.h | 1 +
mm/ksm.c | 1 -
mm/memory.c | 12 ++---
mm/migrate.c | 107 ++++++++++++++++++++++------------------
mm/util.c | 17 +++++++
9 files changed, 89 insertions(+), 61 deletions(-)
--
2.27.0
^ permalink raw reply [flat|nested] 25+ messages in thread
* [PATCH v4 1/6] mm: move memory_failure_queue() into copy_mc_[user]_highpage()
2024-06-03 9:24 [PATCH v4 0/6] mm: migrate: support poison recover from migrate folio Kefeng Wang
@ 2024-06-03 9:24 ` Kefeng Wang
2024-06-04 19:38 ` Jane Chu
2024-06-06 2:28 ` Miaohe Lin
2024-06-03 9:24 ` [PATCH v4 2/6] mm: add folio_mc_copy() Kefeng Wang
` (4 subsequent siblings)
5 siblings, 2 replies; 25+ messages in thread
From: Kefeng Wang @ 2024-06-03 9:24 UTC (permalink / raw)
To: akpm, linux-mm
Cc: Tony Luck, Miaohe Lin, nao.horiguchi, Matthew Wilcox,
David Hildenbrand, Muchun Song, Benjamin LaHaise, jglisse,
Jiaqi Yan, Hugh Dickins, Vishal Moola, Alistair Popple, Jane Chu,
Oscar Salvador, Lance Yang, Kefeng Wang
There is a memory_failure_queue() call after copy_mc_[user]_highpage(),
see callers, eg, CoW/KSM page copy, it is used to mark the source page
as h/w poisoned and unmap it from other tasks, and the upcomming poison
recover from migrate folio will do the similar thing, so let's move the
memory_failure_queue() into the copy_mc_[user]_highpage() instead of
adding it into each user, this should also enhance the handling of
poisoned page in khugepaged.
Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
---
include/linux/highmem.h | 6 ++++++
mm/ksm.c | 1 -
mm/memory.c | 12 +++---------
3 files changed, 9 insertions(+), 10 deletions(-)
diff --git a/include/linux/highmem.h b/include/linux/highmem.h
index 00341b56d291..6b0d6f3c8580 100644
--- a/include/linux/highmem.h
+++ b/include/linux/highmem.h
@@ -352,6 +352,9 @@ static inline int copy_mc_user_highpage(struct page *to, struct page *from,
kunmap_local(vto);
kunmap_local(vfrom);
+ if (ret)
+ memory_failure_queue(page_to_pfn(from), 0);
+
return ret;
}
@@ -368,6 +371,9 @@ static inline int copy_mc_highpage(struct page *to, struct page *from)
kunmap_local(vto);
kunmap_local(vfrom);
+ if (ret)
+ memory_failure_queue(page_to_pfn(from), 0);
+
return ret;
}
#else
diff --git a/mm/ksm.c b/mm/ksm.c
index 452ac8346e6e..3d95e5a9f301 100644
--- a/mm/ksm.c
+++ b/mm/ksm.c
@@ -3091,7 +3091,6 @@ struct folio *ksm_might_need_to_copy(struct folio *folio,
if (copy_mc_user_highpage(folio_page(new_folio, 0), page,
addr, vma)) {
folio_put(new_folio);
- memory_failure_queue(folio_pfn(folio), 0);
return ERR_PTR(-EHWPOISON);
}
folio_set_dirty(new_folio);
diff --git a/mm/memory.c b/mm/memory.c
index 63f9f98b47bd..e06de844eaba 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -3034,10 +3034,8 @@ static inline int __wp_page_copy_user(struct page *dst, struct page *src,
unsigned long addr = vmf->address;
if (likely(src)) {
- if (copy_mc_user_highpage(dst, src, addr, vma)) {
- memory_failure_queue(page_to_pfn(src), 0);
+ if (copy_mc_user_highpage(dst, src, addr, vma))
return -EHWPOISON;
- }
return 0;
}
@@ -6417,10 +6415,8 @@ static int copy_user_gigantic_page(struct folio *dst, struct folio *src,
cond_resched();
if (copy_mc_user_highpage(dst_page, src_page,
- addr + i*PAGE_SIZE, vma)) {
- memory_failure_queue(page_to_pfn(src_page), 0);
+ addr + i*PAGE_SIZE, vma))
return -EHWPOISON;
- }
}
return 0;
}
@@ -6437,10 +6433,8 @@ static int copy_subpage(unsigned long addr, int idx, void *arg)
struct page *dst = nth_page(copy_arg->dst, idx);
struct page *src = nth_page(copy_arg->src, idx);
- if (copy_mc_user_highpage(dst, src, addr, copy_arg->vma)) {
- memory_failure_queue(page_to_pfn(src), 0);
+ if (copy_mc_user_highpage(dst, src, addr, copy_arg->vma))
return -EHWPOISON;
- }
return 0;
}
--
2.27.0
^ permalink raw reply [flat|nested] 25+ messages in thread
* [PATCH v4 2/6] mm: add folio_mc_copy()
2024-06-03 9:24 [PATCH v4 0/6] mm: migrate: support poison recover from migrate folio Kefeng Wang
2024-06-03 9:24 ` [PATCH v4 1/6] mm: move memory_failure_queue() into copy_mc_[user]_highpage() Kefeng Wang
@ 2024-06-03 9:24 ` Kefeng Wang
2024-06-04 19:41 ` Jane Chu
2024-06-06 2:36 ` Miaohe Lin
2024-06-03 9:24 ` [PATCH v4 3/6] mm: migrate: split folio_migrate_mapping() Kefeng Wang
` (3 subsequent siblings)
5 siblings, 2 replies; 25+ messages in thread
From: Kefeng Wang @ 2024-06-03 9:24 UTC (permalink / raw)
To: akpm, linux-mm
Cc: Tony Luck, Miaohe Lin, nao.horiguchi, Matthew Wilcox,
David Hildenbrand, Muchun Song, Benjamin LaHaise, jglisse,
Jiaqi Yan, Hugh Dickins, Vishal Moola, Alistair Popple, Jane Chu,
Oscar Salvador, Lance Yang, Kefeng Wang
Add a #MC variant of folio_copy() which uses copy_mc_highpage() to
support #MC handled during folio copy, it will be used in folio
migration soon.
Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
---
include/linux/mm.h | 1 +
mm/util.c | 17 +++++++++++++++++
2 files changed, 18 insertions(+)
diff --git a/include/linux/mm.h b/include/linux/mm.h
index c499d5adb748..cc21b2f0cdf8 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1321,6 +1321,7 @@ void put_pages_list(struct list_head *pages);
void split_page(struct page *page, unsigned int order);
void folio_copy(struct folio *dst, struct folio *src);
+int folio_mc_copy(struct folio *dst, struct folio *src);
unsigned long nr_free_buffer_pages(void);
diff --git a/mm/util.c b/mm/util.c
index c9e519e6811f..90ea0c0f10d3 100644
--- a/mm/util.c
+++ b/mm/util.c
@@ -828,6 +828,23 @@ void folio_copy(struct folio *dst, struct folio *src)
}
EXPORT_SYMBOL(folio_copy);
+int folio_mc_copy(struct folio *dst, struct folio *src)
+{
+ long nr = folio_nr_pages(src);
+ long i = 0;
+
+ for (;;) {
+ if (copy_mc_highpage(folio_page(dst, i), folio_page(src, i)))
+ return -EHWPOISON;
+ if (++i == nr)
+ break;
+ cond_resched();
+ }
+
+ return 0;
+}
+EXPORT_SYMBOL(folio_mc_copy);
+
int sysctl_overcommit_memory __read_mostly = OVERCOMMIT_GUESS;
int sysctl_overcommit_ratio __read_mostly = 50;
unsigned long sysctl_overcommit_kbytes __read_mostly;
--
2.27.0
^ permalink raw reply [flat|nested] 25+ messages in thread
* [PATCH v4 3/6] mm: migrate: split folio_migrate_mapping()
2024-06-03 9:24 [PATCH v4 0/6] mm: migrate: support poison recover from migrate folio Kefeng Wang
2024-06-03 9:24 ` [PATCH v4 1/6] mm: move memory_failure_queue() into copy_mc_[user]_highpage() Kefeng Wang
2024-06-03 9:24 ` [PATCH v4 2/6] mm: add folio_mc_copy() Kefeng Wang
@ 2024-06-03 9:24 ` Kefeng Wang
2024-06-06 0:54 ` Jane Chu
2024-06-06 18:28 ` Jane Chu
2024-06-03 9:24 ` [PATCH v4 4/6] mm: migrate: support poisoned recover from migrate folio Kefeng Wang
` (2 subsequent siblings)
5 siblings, 2 replies; 25+ messages in thread
From: Kefeng Wang @ 2024-06-03 9:24 UTC (permalink / raw)
To: akpm, linux-mm
Cc: Tony Luck, Miaohe Lin, nao.horiguchi, Matthew Wilcox,
David Hildenbrand, Muchun Song, Benjamin LaHaise, jglisse,
Jiaqi Yan, Hugh Dickins, Vishal Moola, Alistair Popple, Jane Chu,
Oscar Salvador, Lance Yang, Kefeng Wang
The folio refcount check for !mapping and folio_ref_freeze() for
mapping are moved out of the original folio_migrate_mapping(), and
there is no turning back for the new __folio_migrate_mapping(),
also update comment from page to folio.
Note, the folio_ref_freeze() is moved out of xas_lock_irq(),
Since the folio is already isolated and locked during migration,
so suppose that there is no functional change.
Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
---
mm/migrate.c | 63 ++++++++++++++++++++++++++--------------------------
1 file changed, 32 insertions(+), 31 deletions(-)
diff --git a/mm/migrate.c b/mm/migrate.c
index e04b451c4289..e930376c261a 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -393,50 +393,36 @@ static int folio_expected_refs(struct address_space *mapping,
}
/*
- * Replace the page in the mapping.
+ * Replace the folio in the mapping.
*
* The number of remaining references must be:
- * 1 for anonymous pages without a mapping
- * 2 for pages with a mapping
- * 3 for pages with a mapping and PagePrivate/PagePrivate2 set.
+ * 1 for anonymous folios without a mapping
+ * 2 for folios with a mapping
+ * 3 for folios with a mapping and PagePrivate/PagePrivate2 set.
*/
-int folio_migrate_mapping(struct address_space *mapping,
- struct folio *newfolio, struct folio *folio, int extra_count)
+static void __folio_migrate_mapping(struct address_space *mapping,
+ struct folio *newfolio, struct folio *folio, int expected_cnt)
{
XA_STATE(xas, &mapping->i_pages, folio_index(folio));
struct zone *oldzone, *newzone;
- int dirty;
- int expected_count = folio_expected_refs(mapping, folio) + extra_count;
long nr = folio_nr_pages(folio);
long entries, i;
+ int dirty;
if (!mapping) {
- /* Anonymous page without mapping */
- if (folio_ref_count(folio) != expected_count)
- return -EAGAIN;
-
- /* No turning back from here */
+ /* Anonymous folio without mapping */
newfolio->index = folio->index;
newfolio->mapping = folio->mapping;
if (folio_test_swapbacked(folio))
__folio_set_swapbacked(newfolio);
-
- return MIGRATEPAGE_SUCCESS;
+ return;
}
oldzone = folio_zone(folio);
newzone = folio_zone(newfolio);
xas_lock_irq(&xas);
- if (!folio_ref_freeze(folio, expected_count)) {
- xas_unlock_irq(&xas);
- return -EAGAIN;
- }
-
- /*
- * Now we know that no one else is looking at the folio:
- * no turning back from here.
- */
+ /* Now we know that no one else is looking at the folio */
newfolio->index = folio->index;
newfolio->mapping = folio->mapping;
folio_ref_add(newfolio, nr); /* add cache reference */
@@ -452,7 +438,7 @@ int folio_migrate_mapping(struct address_space *mapping,
entries = 1;
}
- /* Move dirty while page refs frozen and newpage not yet exposed */
+ /* Move dirty while folio refs frozen and newfolio not yet exposed */
dirty = folio_test_dirty(folio);
if (dirty) {
folio_clear_dirty(folio);
@@ -466,22 +452,22 @@ int folio_migrate_mapping(struct address_space *mapping,
}
/*
- * Drop cache reference from old page by unfreezing
- * to one less reference.
+ * Since old folio's refcount freezed, now drop cache reference from
+ * old folio by unfreezing to one less reference.
* We know this isn't the last reference.
*/
- folio_ref_unfreeze(folio, expected_count - nr);
+ folio_ref_unfreeze(folio, expected_cnt - nr);
xas_unlock(&xas);
/* Leave irq disabled to prevent preemption while updating stats */
/*
* If moved to a different zone then also account
- * the page for that zone. Other VM counters will be
+ * the folio for that zone. Other VM counters will be
* taken care of when we establish references to the
- * new page and drop references to the old page.
+ * new folio and drop references to the old folio.
*
- * Note that anonymous pages are accounted for
+ * Note that anonymous folios are accounted for
* via NR_FILE_PAGES and NR_ANON_MAPPED if they
* are mapped to swap space.
*/
@@ -518,7 +504,22 @@ int folio_migrate_mapping(struct address_space *mapping,
}
}
local_irq_enable();
+}
+
+int folio_migrate_mapping(struct address_space *mapping, struct folio *newfolio,
+ struct folio *folio, int extra_count)
+{
+ int expected_cnt = folio_expected_refs(mapping, folio) + extra_count;
+
+ if (!mapping) {
+ if (folio_ref_count(folio) != expected_cnt)
+ return -EAGAIN;
+ } else {
+ if (!folio_ref_freeze(folio, expected_cnt))
+ return -EAGAIN;
+ }
+ __folio_migrate_mapping(mapping, newfolio, folio, expected_cnt);
return MIGRATEPAGE_SUCCESS;
}
EXPORT_SYMBOL(folio_migrate_mapping);
--
2.27.0
^ permalink raw reply [flat|nested] 25+ messages in thread
* [PATCH v4 4/6] mm: migrate: support poisoned recover from migrate folio
2024-06-03 9:24 [PATCH v4 0/6] mm: migrate: support poison recover from migrate folio Kefeng Wang
` (2 preceding siblings ...)
2024-06-03 9:24 ` [PATCH v4 3/6] mm: migrate: split folio_migrate_mapping() Kefeng Wang
@ 2024-06-03 9:24 ` Kefeng Wang
2024-06-06 21:27 ` Jane Chu
2024-06-03 9:24 ` [PATCH v4 5/6] fs: hugetlbfs: support poison recover from hugetlbfs_migrate_folio() Kefeng Wang
2024-06-03 9:24 ` [PATCH v4 6/6] mm: migrate: remove folio_migrate_copy() Kefeng Wang
5 siblings, 1 reply; 25+ messages in thread
From: Kefeng Wang @ 2024-06-03 9:24 UTC (permalink / raw)
To: akpm, linux-mm
Cc: Tony Luck, Miaohe Lin, nao.horiguchi, Matthew Wilcox,
David Hildenbrand, Muchun Song, Benjamin LaHaise, jglisse,
Jiaqi Yan, Hugh Dickins, Vishal Moola, Alistair Popple, Jane Chu,
Oscar Salvador, Lance Yang, Kefeng Wang
The folio migration is widely used in kernel, memory compaction, memory
hotplug, soft offline page, numa balance, memory demote/promotion, etc,
but once access a poisoned source folio when migrating, the kerenl will
panic.
There is a mechanism in the kernel to recover from uncorrectable memory
errors, ARCH_HAS_COPY_MC, which is already used in other core-mm paths,
eg, CoW, khugepaged, coredump, ksm copy, see copy_mc_to_{user,kernel},
copy_mc_{user_}highpage callers.
In order to support poisoned folio copy recover from migrate folio, we
chose to make folio migration tolerant of memory failures and return
error for folio migration, because folio migration is no guarantee
of success, this could avoid the similar panic shown below.
CPU: 1 PID: 88343 Comm: test_softofflin Kdump: loaded Not tainted 6.6.0
pc : copy_page+0x10/0xc0
lr : copy_highpage+0x38/0x50
...
Call trace:
copy_page+0x10/0xc0
folio_copy+0x78/0x90
migrate_folio_extra+0x54/0xa0
move_to_new_folio+0xd8/0x1f0
migrate_folio_move+0xb8/0x300
migrate_pages_batch+0x528/0x788
migrate_pages_sync+0x8c/0x258
migrate_pages+0x440/0x528
soft_offline_in_use_page+0x2ec/0x3c0
soft_offline_page+0x238/0x310
soft_offline_page_store+0x6c/0xc0
dev_attr_store+0x20/0x40
sysfs_kf_write+0x4c/0x68
kernfs_fop_write_iter+0x130/0x1c8
new_sync_write+0xa4/0x138
vfs_write+0x238/0x2d8
ksys_write+0x74/0x110
Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
---
mm/migrate.c | 23 ++++++++++++++++++-----
1 file changed, 18 insertions(+), 5 deletions(-)
diff --git a/mm/migrate.c b/mm/migrate.c
index e930376c261a..28aa9da95781 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -663,16 +663,29 @@ static int __migrate_folio(struct address_space *mapping, struct folio *dst,
struct folio *src, void *src_private,
enum migrate_mode mode)
{
- int rc;
+ int ret, expected_cnt = folio_expected_refs(mapping, src);
- rc = folio_migrate_mapping(mapping, dst, src, 0);
- if (rc != MIGRATEPAGE_SUCCESS)
- return rc;
+ if (!mapping) {
+ if (folio_ref_count(src) != expected_cnt)
+ return -EAGAIN;
+ } else {
+ if (!folio_ref_freeze(src, expected_cnt))
+ return -EAGAIN;
+ }
+
+ ret = folio_mc_copy(dst, src);
+ if (unlikely(ret)) {
+ if (mapping)
+ folio_ref_unfreeze(src, expected_cnt);
+ return ret;
+ }
+
+ __folio_migrate_mapping(mapping, dst, src, expected_cnt);
if (src_private)
folio_attach_private(dst, folio_detach_private(src));
- folio_migrate_copy(dst, src);
+ folio_migrate_flags(dst, src);
return MIGRATEPAGE_SUCCESS;
}
--
2.27.0
^ permalink raw reply [flat|nested] 25+ messages in thread
* [PATCH v4 5/6] fs: hugetlbfs: support poison recover from hugetlbfs_migrate_folio()
2024-06-03 9:24 [PATCH v4 0/6] mm: migrate: support poison recover from migrate folio Kefeng Wang
` (3 preceding siblings ...)
2024-06-03 9:24 ` [PATCH v4 4/6] mm: migrate: support poisoned recover from migrate folio Kefeng Wang
@ 2024-06-03 9:24 ` Kefeng Wang
2024-06-06 23:30 ` Jane Chu
2024-06-03 9:24 ` [PATCH v4 6/6] mm: migrate: remove folio_migrate_copy() Kefeng Wang
5 siblings, 1 reply; 25+ messages in thread
From: Kefeng Wang @ 2024-06-03 9:24 UTC (permalink / raw)
To: akpm, linux-mm
Cc: Tony Luck, Miaohe Lin, nao.horiguchi, Matthew Wilcox,
David Hildenbrand, Muchun Song, Benjamin LaHaise, jglisse,
Jiaqi Yan, Hugh Dickins, Vishal Moola, Alistair Popple, Jane Chu,
Oscar Salvador, Lance Yang, Kefeng Wang
This is similar to __migrate_folio(), use folio_mc_copy() in HugeTLB
folio migration to avoid panic when copy from poisoned folio.
Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
---
fs/hugetlbfs/inode.c | 2 +-
mm/migrate.c | 14 +++++++++-----
2 files changed, 10 insertions(+), 6 deletions(-)
diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
index 6df794ed4066..1107e5aa8343 100644
--- a/fs/hugetlbfs/inode.c
+++ b/fs/hugetlbfs/inode.c
@@ -1128,7 +1128,7 @@ static int hugetlbfs_migrate_folio(struct address_space *mapping,
hugetlb_set_folio_subpool(src, NULL);
}
- folio_migrate_copy(dst, src);
+ folio_migrate_flags(dst, src);
return MIGRATEPAGE_SUCCESS;
}
diff --git a/mm/migrate.c b/mm/migrate.c
index 28aa9da95781..e9b52a86f539 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -532,15 +532,19 @@ int migrate_huge_page_move_mapping(struct address_space *mapping,
struct folio *dst, struct folio *src)
{
XA_STATE(xas, &mapping->i_pages, folio_index(src));
- int expected_count;
+ int ret, expected_count = folio_expected_refs(mapping, src);
- xas_lock_irq(&xas);
- expected_count = folio_expected_refs(mapping, src);
- if (!folio_ref_freeze(src, expected_count)) {
- xas_unlock_irq(&xas);
+ if (!folio_ref_freeze(src, expected_count))
return -EAGAIN;
+
+ ret = folio_mc_copy(dst, src);
+ if (unlikely(ret)) {
+ folio_ref_unfreeze(src, expected_count);
+ return ret;
}
+ xas_lock_irq(&xas);
+
dst->index = src->index;
dst->mapping = src->mapping;
--
2.27.0
^ permalink raw reply [flat|nested] 25+ messages in thread
* [PATCH v4 6/6] mm: migrate: remove folio_migrate_copy()
2024-06-03 9:24 [PATCH v4 0/6] mm: migrate: support poison recover from migrate folio Kefeng Wang
` (4 preceding siblings ...)
2024-06-03 9:24 ` [PATCH v4 5/6] fs: hugetlbfs: support poison recover from hugetlbfs_migrate_folio() Kefeng Wang
@ 2024-06-03 9:24 ` Kefeng Wang
2024-06-06 23:46 ` Jane Chu
5 siblings, 1 reply; 25+ messages in thread
From: Kefeng Wang @ 2024-06-03 9:24 UTC (permalink / raw)
To: akpm, linux-mm
Cc: Tony Luck, Miaohe Lin, nao.horiguchi, Matthew Wilcox,
David Hildenbrand, Muchun Song, Benjamin LaHaise, jglisse,
Jiaqi Yan, Hugh Dickins, Vishal Moola, Alistair Popple, Jane Chu,
Oscar Salvador, Lance Yang, Kefeng Wang
The folio_migrate_copy() is just a wrapper of folio_copy() and
folio_migrate_flags(), it is simple and only aio use it for now,
unfold it and remove folio_migrate_copy().
Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
---
fs/aio.c | 3 ++-
include/linux/migrate.h | 1 -
mm/migrate.c | 7 -------
3 files changed, 2 insertions(+), 9 deletions(-)
diff --git a/fs/aio.c b/fs/aio.c
index 07ff8bbdcd2a..bcee11fcb08b 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -455,7 +455,8 @@ static int aio_migrate_folio(struct address_space *mapping, struct folio *dst,
* events from being lost.
*/
spin_lock_irqsave(&ctx->completion_lock, flags);
- folio_migrate_copy(dst, src);
+ folio_copy(dst, src);
+ folio_migrate_flags(dst, src);
BUG_ON(ctx->ring_folios[idx] != src);
ctx->ring_folios[idx] = dst;
spin_unlock_irqrestore(&ctx->completion_lock, flags);
diff --git a/include/linux/migrate.h b/include/linux/migrate.h
index 517f70b70620..f9d92482d117 100644
--- a/include/linux/migrate.h
+++ b/include/linux/migrate.h
@@ -76,7 +76,6 @@ int migrate_huge_page_move_mapping(struct address_space *mapping,
void migration_entry_wait_on_locked(swp_entry_t entry, spinlock_t *ptl)
__releases(ptl);
void folio_migrate_flags(struct folio *newfolio, struct folio *folio);
-void folio_migrate_copy(struct folio *newfolio, struct folio *folio);
int folio_migrate_mapping(struct address_space *mapping,
struct folio *newfolio, struct folio *folio, int extra_count);
diff --git a/mm/migrate.c b/mm/migrate.c
index e9b52a86f539..fe6ea3fb896e 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -652,13 +652,6 @@ void folio_migrate_flags(struct folio *newfolio, struct folio *folio)
}
EXPORT_SYMBOL(folio_migrate_flags);
-void folio_migrate_copy(struct folio *newfolio, struct folio *folio)
-{
- folio_copy(newfolio, folio);
- folio_migrate_flags(newfolio, folio);
-}
-EXPORT_SYMBOL(folio_migrate_copy);
-
/************************************************************
* Migration functions
***********************************************************/
--
2.27.0
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v4 1/6] mm: move memory_failure_queue() into copy_mc_[user]_highpage()
2024-06-03 9:24 ` [PATCH v4 1/6] mm: move memory_failure_queue() into copy_mc_[user]_highpage() Kefeng Wang
@ 2024-06-04 19:38 ` Jane Chu
2024-06-06 2:28 ` Miaohe Lin
1 sibling, 0 replies; 25+ messages in thread
From: Jane Chu @ 2024-06-04 19:38 UTC (permalink / raw)
To: Kefeng Wang, akpm, linux-mm
Cc: Tony Luck, Miaohe Lin, nao.horiguchi, Matthew Wilcox,
David Hildenbrand, Muchun Song, Benjamin LaHaise, jglisse,
Jiaqi Yan, Hugh Dickins, Vishal Moola, Alistair Popple,
Oscar Salvador, Lance Yang
On 6/3/2024 2:24 AM, Kefeng Wang wrote:
> There is a memory_failure_queue() call after copy_mc_[user]_highpage(),
> see callers, eg, CoW/KSM page copy, it is used to mark the source page
> as h/w poisoned and unmap it from other tasks, and the upcomming poison
> recover from migrate folio will do the similar thing, so let's move the
> memory_failure_queue() into the copy_mc_[user]_highpage() instead of
> adding it into each user, this should also enhance the handling of
> poisoned page in khugepaged.
>
> Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
> ---
> include/linux/highmem.h | 6 ++++++
> mm/ksm.c | 1 -
> mm/memory.c | 12 +++---------
> 3 files changed, 9 insertions(+), 10 deletions(-)
>
> diff --git a/include/linux/highmem.h b/include/linux/highmem.h
> index 00341b56d291..6b0d6f3c8580 100644
> --- a/include/linux/highmem.h
> +++ b/include/linux/highmem.h
> @@ -352,6 +352,9 @@ static inline int copy_mc_user_highpage(struct page *to, struct page *from,
> kunmap_local(vto);
> kunmap_local(vfrom);
>
> + if (ret)
> + memory_failure_queue(page_to_pfn(from), 0);
> +
> return ret;
> }
>
> @@ -368,6 +371,9 @@ static inline int copy_mc_highpage(struct page *to, struct page *from)
> kunmap_local(vto);
> kunmap_local(vfrom);
>
> + if (ret)
> + memory_failure_queue(page_to_pfn(from), 0);
> +
> return ret;
> }
> #else
> diff --git a/mm/ksm.c b/mm/ksm.c
> index 452ac8346e6e..3d95e5a9f301 100644
> --- a/mm/ksm.c
> +++ b/mm/ksm.c
> @@ -3091,7 +3091,6 @@ struct folio *ksm_might_need_to_copy(struct folio *folio,
> if (copy_mc_user_highpage(folio_page(new_folio, 0), page,
> addr, vma)) {
> folio_put(new_folio);
> - memory_failure_queue(folio_pfn(folio), 0);
> return ERR_PTR(-EHWPOISON);
> }
> folio_set_dirty(new_folio);
> diff --git a/mm/memory.c b/mm/memory.c
> index 63f9f98b47bd..e06de844eaba 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -3034,10 +3034,8 @@ static inline int __wp_page_copy_user(struct page *dst, struct page *src,
> unsigned long addr = vmf->address;
>
> if (likely(src)) {
> - if (copy_mc_user_highpage(dst, src, addr, vma)) {
> - memory_failure_queue(page_to_pfn(src), 0);
> + if (copy_mc_user_highpage(dst, src, addr, vma))
> return -EHWPOISON;
> - }
> return 0;
> }
>
> @@ -6417,10 +6415,8 @@ static int copy_user_gigantic_page(struct folio *dst, struct folio *src,
>
> cond_resched();
> if (copy_mc_user_highpage(dst_page, src_page,
> - addr + i*PAGE_SIZE, vma)) {
> - memory_failure_queue(page_to_pfn(src_page), 0);
> + addr + i*PAGE_SIZE, vma))
> return -EHWPOISON;
> - }
> }
> return 0;
> }
> @@ -6437,10 +6433,8 @@ static int copy_subpage(unsigned long addr, int idx, void *arg)
> struct page *dst = nth_page(copy_arg->dst, idx);
> struct page *src = nth_page(copy_arg->src, idx);
>
> - if (copy_mc_user_highpage(dst, src, addr, copy_arg->vma)) {
> - memory_failure_queue(page_to_pfn(src), 0);
> + if (copy_mc_user_highpage(dst, src, addr, copy_arg->vma))
> return -EHWPOISON;
> - }
> return 0;
> }
>
Reviewed-by: Jane Chu <jane.chu@oracle.com>
Thanks!
-jane
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v4 2/6] mm: add folio_mc_copy()
2024-06-03 9:24 ` [PATCH v4 2/6] mm: add folio_mc_copy() Kefeng Wang
@ 2024-06-04 19:41 ` Jane Chu
2024-06-05 3:31 ` Andrew Morton
2024-06-06 2:36 ` Miaohe Lin
1 sibling, 1 reply; 25+ messages in thread
From: Jane Chu @ 2024-06-04 19:41 UTC (permalink / raw)
To: Kefeng Wang, akpm, linux-mm
Cc: Tony Luck, Miaohe Lin, nao.horiguchi, Matthew Wilcox,
David Hildenbrand, Muchun Song, Benjamin LaHaise, jglisse,
Jiaqi Yan, Hugh Dickins, Vishal Moola, Alistair Popple,
Oscar Salvador, Lance Yang
On 6/3/2024 2:24 AM, Kefeng Wang wrote:
> Add a #MC variant of folio_copy() which uses copy_mc_highpage() to
> support #MC handled during folio copy, it will be used in folio
> migration soon.
>
> Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
> ---
> include/linux/mm.h | 1 +
> mm/util.c | 17 +++++++++++++++++
> 2 files changed, 18 insertions(+)
>
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index c499d5adb748..cc21b2f0cdf8 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -1321,6 +1321,7 @@ void put_pages_list(struct list_head *pages);
>
> void split_page(struct page *page, unsigned int order);
> void folio_copy(struct folio *dst, struct folio *src);
> +int folio_mc_copy(struct folio *dst, struct folio *src);
>
> unsigned long nr_free_buffer_pages(void);
>
> diff --git a/mm/util.c b/mm/util.c
> index c9e519e6811f..90ea0c0f10d3 100644
> --- a/mm/util.c
> +++ b/mm/util.c
> @@ -828,6 +828,23 @@ void folio_copy(struct folio *dst, struct folio *src)
> }
> EXPORT_SYMBOL(folio_copy);
>
> +int folio_mc_copy(struct folio *dst, struct folio *src)
> +{
> + long nr = folio_nr_pages(src);
> + long i = 0;
> +
> + for (;;) {
> + if (copy_mc_highpage(folio_page(dst, i), folio_page(src, i)))
> + return -EHWPOISON;
> + if (++i == nr)
> + break;
> + cond_resched();
> + }
> +
> + return 0;
> +}
> +EXPORT_SYMBOL(folio_mc_copy);
> +
> int sysctl_overcommit_memory __read_mostly = OVERCOMMIT_GUESS;
> int sysctl_overcommit_ratio __read_mostly = 50;
> unsigned long sysctl_overcommit_kbytes __read_mostly;
Reviewed-by: Jane Chu <jane.chu@oracle.com>
thanks,
-jane
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v4 2/6] mm: add folio_mc_copy()
2024-06-04 19:41 ` Jane Chu
@ 2024-06-05 3:31 ` Andrew Morton
2024-06-05 7:15 ` Kefeng Wang
0 siblings, 1 reply; 25+ messages in thread
From: Andrew Morton @ 2024-06-05 3:31 UTC (permalink / raw)
To: Jane Chu
Cc: Kefeng Wang, linux-mm, Tony Luck, Miaohe Lin, nao.horiguchi,
Matthew Wilcox, David Hildenbrand, Muchun Song, Benjamin LaHaise,
jglisse, Jiaqi Yan, Hugh Dickins, Vishal Moola, Alistair Popple,
Oscar Salvador, Lance Yang
On Tue, 4 Jun 2024 12:41:20 -0700 Jane Chu <jane.chu@oracle.com> wrote:
> > int sysctl_overcommit_memory __read_mostly = OVERCOMMIT_GUESS;
> > int sysctl_overcommit_ratio __read_mostly = 50;
> > unsigned long sysctl_overcommit_kbytes __read_mostly;
>
> Reviewed-by: Jane Chu <jane.chu@oracle.com>
Thanks. I'll merge the series for testing, but I'd prefer to see more
review activity before proceeding further. Please.
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v4 2/6] mm: add folio_mc_copy()
2024-06-05 3:31 ` Andrew Morton
@ 2024-06-05 7:15 ` Kefeng Wang
0 siblings, 0 replies; 25+ messages in thread
From: Kefeng Wang @ 2024-06-05 7:15 UTC (permalink / raw)
To: Andrew Morton, Jane Chu
Cc: linux-mm, Tony Luck, Miaohe Lin, nao.horiguchi, Matthew Wilcox,
David Hildenbrand, Muchun Song, Benjamin LaHaise, jglisse,
Jiaqi Yan, Hugh Dickins, Vishal Moola, Alistair Popple,
Oscar Salvador, Lance Yang
On 2024/6/5 11:31, Andrew Morton wrote:
> On Tue, 4 Jun 2024 12:41:20 -0700 Jane Chu <jane.chu@oracle.com> wrote:
>
>>> int sysctl_overcommit_memory __read_mostly = OVERCOMMIT_GUESS;
>>> int sysctl_overcommit_ratio __read_mostly = 50;
>>> unsigned long sysctl_overcommit_kbytes __read_mostly;
>>
>> Reviewed-by: Jane Chu <jane.chu@oracle.com>
>
> Thanks. I'll merge the series for testing, but I'd prefer to see more
> review activity before proceeding further. Please.
Thank Andrew and thank for all the review, more review is appreciated.
>
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v4 3/6] mm: migrate: split folio_migrate_mapping()
2024-06-03 9:24 ` [PATCH v4 3/6] mm: migrate: split folio_migrate_mapping() Kefeng Wang
@ 2024-06-06 0:54 ` Jane Chu
2024-06-06 1:24 ` Kefeng Wang
2024-06-06 18:28 ` Jane Chu
1 sibling, 1 reply; 25+ messages in thread
From: Jane Chu @ 2024-06-06 0:54 UTC (permalink / raw)
To: Kefeng Wang, akpm, linux-mm, Matthew Wilcox
[-- Attachment #1: Type: text/plain, Size: 872 bytes --]
[..]
> -int folio_migrate_mapping(struct address_space *mapping,
> - struct folio *newfolio, struct folio *folio, int extra_count)
> +static void __folio_migrate_mapping(struct address_space *mapping,
> + struct folio *newfolio, struct folio *folio, int expected_cnt)
> {
> XA_STATE(xas, &mapping->i_pages, folio_index(folio));
> struct zone *oldzone, *newzone;
> - int dirty;
> - int expected_count = folio_expected_refs(mapping, folio) + extra_count;
> long nr = folio_nr_pages(folio);
> long entries, i;
> + int dirty;
>
> if (!mapping) {
> - /* Anonymous page without mapping */
If 'mapping' was NULL, the first line would blow up while dereferencing
'mapping->i_pages'.
The ordering change was introduced by commit
89eb946a7432b "(mm: Convert page migration to XArray)"
Matthew, please correct me if I'm missing something.
thanks,
-jane
[-- Attachment #2: Type: text/html, Size: 1568 bytes --]
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v4 3/6] mm: migrate: split folio_migrate_mapping()
2024-06-06 0:54 ` Jane Chu
@ 2024-06-06 1:24 ` Kefeng Wang
2024-06-06 1:55 ` Matthew Wilcox
0 siblings, 1 reply; 25+ messages in thread
From: Kefeng Wang @ 2024-06-06 1:24 UTC (permalink / raw)
To: Jane Chu, akpm, linux-mm, Matthew Wilcox
On 2024/6/6 8:54, Jane Chu wrote:
> [..]
>> -int folio_migrate_mapping(struct address_space *mapping,
>> - struct folio *newfolio, struct folio *folio, int extra_count)
>> +static void __folio_migrate_mapping(struct address_space *mapping,
>> + struct folio *newfolio, struct folio *folio, int expected_cnt)
>> {
>> XA_STATE(xas, &mapping->i_pages, folio_index(folio));
>> struct zone *oldzone, *newzone;
>> - int dirty;
>> - int expected_count = folio_expected_refs(mapping, folio) + extra_count;
>> long nr = folio_nr_pages(folio);
>> long entries, i;
>> + int dirty;
>>
>> if (!mapping) {
>> - /* Anonymous page without mapping */
>
> If 'mapping' was NULL, the first line would blow up while dereferencing
> 'mapping->i_pages'.
0->i_pages is wrong, but &(0->i_pages) is legal, and then xas->xa = NULL.
>
> The ordering change was introduced by commit
>
> 89eb946a7432b "(mm: Convert page migration to XArray)"
>
> Matthew, please correct me if I'm missing something.
>
> thanks,
>
> -jane
>
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v4 3/6] mm: migrate: split folio_migrate_mapping()
2024-06-06 1:24 ` Kefeng Wang
@ 2024-06-06 1:55 ` Matthew Wilcox
2024-06-06 2:24 ` Kefeng Wang
0 siblings, 1 reply; 25+ messages in thread
From: Matthew Wilcox @ 2024-06-06 1:55 UTC (permalink / raw)
To: Kefeng Wang; +Cc: Jane Chu, akpm, linux-mm
On Thu, Jun 06, 2024 at 09:24:37AM +0800, Kefeng Wang wrote:
> On 2024/6/6 8:54, Jane Chu wrote:
> > [..]
> > > -int folio_migrate_mapping(struct address_space *mapping,
> > > - struct folio *newfolio, struct folio *folio, int extra_count)
> > > +static void __folio_migrate_mapping(struct address_space *mapping,
> > > + struct folio *newfolio, struct folio *folio, int expected_cnt)
> > > {
> > > XA_STATE(xas, &mapping->i_pages, folio_index(folio));
> > > struct zone *oldzone, *newzone;
> > > - int dirty;
> > > - int expected_count = folio_expected_refs(mapping, folio) + extra_count;
> > > long nr = folio_nr_pages(folio);
> > > long entries, i;
> > > + int dirty;
> > > if (!mapping) {
> > > - /* Anonymous page without mapping */
> >
> > If 'mapping' was NULL, the first line would blow up while dereferencing
> > 'mapping->i_pages'.
>
> 0->i_pages is wrong, but &(0->i_pages) is legal, and then xas->xa = NULL.
Uhh, it's not NULL, but it will be a small integer (offsetof(struct
address_space, i_pages)).
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v4 3/6] mm: migrate: split folio_migrate_mapping()
2024-06-06 1:55 ` Matthew Wilcox
@ 2024-06-06 2:24 ` Kefeng Wang
0 siblings, 0 replies; 25+ messages in thread
From: Kefeng Wang @ 2024-06-06 2:24 UTC (permalink / raw)
To: Matthew Wilcox; +Cc: Jane Chu, akpm, linux-mm
On 2024/6/6 9:55, Matthew Wilcox wrote:
> On Thu, Jun 06, 2024 at 09:24:37AM +0800, Kefeng Wang wrote:
>> On 2024/6/6 8:54, Jane Chu wrote:
>>> [..]
>>>> -int folio_migrate_mapping(struct address_space *mapping,
>>>> - struct folio *newfolio, struct folio *folio, int extra_count)
>>>> +static void __folio_migrate_mapping(struct address_space *mapping,
>>>> + struct folio *newfolio, struct folio *folio, int expected_cnt)
>>>> {
>>>> XA_STATE(xas, &mapping->i_pages, folio_index(folio));
>>>> struct zone *oldzone, *newzone;
>>>> - int dirty;
>>>> - int expected_count = folio_expected_refs(mapping, folio) + extra_count;
>>>> long nr = folio_nr_pages(folio);
>>>> long entries, i;
>>>> + int dirty;
>>>> if (!mapping) {
>>>> - /* Anonymous page without mapping */
>>>
>>> If 'mapping' was NULL, the first line would blow up while dereferencing
>>> 'mapping->i_pages'.
>>
>> 0->i_pages is wrong, but &(0->i_pages) is legal, and then xas->xa = NULL.
>
> Uhh, it's not NULL, but it will be a small integer (offsetof(struct
> address_space, i_pages)).
Oh, indeed, forget the offset, thanks.
>
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v4 1/6] mm: move memory_failure_queue() into copy_mc_[user]_highpage()
2024-06-03 9:24 ` [PATCH v4 1/6] mm: move memory_failure_queue() into copy_mc_[user]_highpage() Kefeng Wang
2024-06-04 19:38 ` Jane Chu
@ 2024-06-06 2:28 ` Miaohe Lin
1 sibling, 0 replies; 25+ messages in thread
From: Miaohe Lin @ 2024-06-06 2:28 UTC (permalink / raw)
To: Kefeng Wang, akpm, linux-mm
Cc: Tony Luck, nao.horiguchi, Matthew Wilcox, David Hildenbrand,
Muchun Song, Benjamin LaHaise, jglisse, Jiaqi Yan, Hugh Dickins,
Vishal Moola, Alistair Popple, Jane Chu, Oscar Salvador,
Lance Yang
On 2024/6/3 17:24, Kefeng Wang wrote:
> There is a memory_failure_queue() call after copy_mc_[user]_highpage(),
> see callers, eg, CoW/KSM page copy, it is used to mark the source page
> as h/w poisoned and unmap it from other tasks, and the upcomming poison
> recover from migrate folio will do the similar thing, so let's move the
> memory_failure_queue() into the copy_mc_[user]_highpage() instead of
> adding it into each user, this should also enhance the handling of
> poisoned page in khugepaged.
>
> Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
LGTM. Thanks.
Reviewed-by: Miaohe Lin <linmiaohe@huawei.com>
Thanks.
.
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v4 2/6] mm: add folio_mc_copy()
2024-06-03 9:24 ` [PATCH v4 2/6] mm: add folio_mc_copy() Kefeng Wang
2024-06-04 19:41 ` Jane Chu
@ 2024-06-06 2:36 ` Miaohe Lin
1 sibling, 0 replies; 25+ messages in thread
From: Miaohe Lin @ 2024-06-06 2:36 UTC (permalink / raw)
To: Kefeng Wang, akpm, linux-mm
Cc: Tony Luck, nao.horiguchi, Matthew Wilcox, David Hildenbrand,
Muchun Song, Benjamin LaHaise, jglisse, Jiaqi Yan, Hugh Dickins,
Vishal Moola, Alistair Popple, Jane Chu, Oscar Salvador,
Lance Yang
On 2024/6/3 17:24, Kefeng Wang wrote:
> Add a #MC variant of folio_copy() which uses copy_mc_highpage() to
> support #MC handled during folio copy, it will be used in folio
> migration soon.
>
> Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Reviewed-by: Miaohe Lin <linmiaohe@huawei.com>
Thanks.
.
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v4 3/6] mm: migrate: split folio_migrate_mapping()
2024-06-03 9:24 ` [PATCH v4 3/6] mm: migrate: split folio_migrate_mapping() Kefeng Wang
2024-06-06 0:54 ` Jane Chu
@ 2024-06-06 18:28 ` Jane Chu
1 sibling, 0 replies; 25+ messages in thread
From: Jane Chu @ 2024-06-06 18:28 UTC (permalink / raw)
To: Kefeng Wang, akpm, linux-mm
Cc: Tony Luck, Miaohe Lin, nao.horiguchi, Matthew Wilcox,
David Hildenbrand, Muchun Song, Benjamin LaHaise, jglisse,
Jiaqi Yan, Hugh Dickins, Vishal Moola, Alistair Popple,
Oscar Salvador, Lance Yang
On 6/3/2024 2:24 AM, Kefeng Wang wrote:
> The folio refcount check for !mapping and folio_ref_freeze() for
> mapping are moved out of the original folio_migrate_mapping(), and
> there is no turning back for the new __folio_migrate_mapping(),
> also update comment from page to folio.
>
> Note, the folio_ref_freeze() is moved out of xas_lock_irq(),
> Since the folio is already isolated and locked during migration,
> so suppose that there is no functional change.
>
> Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
> ---
> mm/migrate.c | 63 ++++++++++++++++++++++++++--------------------------
> 1 file changed, 32 insertions(+), 31 deletions(-)
>
> diff --git a/mm/migrate.c b/mm/migrate.c
> index e04b451c4289..e930376c261a 100644
> --- a/mm/migrate.c
> +++ b/mm/migrate.c
> @@ -393,50 +393,36 @@ static int folio_expected_refs(struct address_space *mapping,
> }
>
> /*
> - * Replace the page in the mapping.
> + * Replace the folio in the mapping.
> *
> * The number of remaining references must be:
> - * 1 for anonymous pages without a mapping
> - * 2 for pages with a mapping
> - * 3 for pages with a mapping and PagePrivate/PagePrivate2 set.
> + * 1 for anonymous folios without a mapping
> + * 2 for folios with a mapping
> + * 3 for folios with a mapping and PagePrivate/PagePrivate2 set.
> */
> -int folio_migrate_mapping(struct address_space *mapping,
> - struct folio *newfolio, struct folio *folio, int extra_count)
> +static void __folio_migrate_mapping(struct address_space *mapping,
> + struct folio *newfolio, struct folio *folio, int expected_cnt)
> {
> XA_STATE(xas, &mapping->i_pages, folio_index(folio));
> struct zone *oldzone, *newzone;
> - int dirty;
> - int expected_count = folio_expected_refs(mapping, folio) + extra_count;
> long nr = folio_nr_pages(folio);
> long entries, i;
> + int dirty;
>
> if (!mapping) {
> - /* Anonymous page without mapping */
> - if (folio_ref_count(folio) != expected_count)
> - return -EAGAIN;
> -
> - /* No turning back from here */
> + /* Anonymous folio without mapping */
> newfolio->index = folio->index;
> newfolio->mapping = folio->mapping;
> if (folio_test_swapbacked(folio))
> __folio_set_swapbacked(newfolio);
> -
> - return MIGRATEPAGE_SUCCESS;
> + return;
> }
>
> oldzone = folio_zone(folio);
> newzone = folio_zone(newfolio);
>
> xas_lock_irq(&xas);
> - if (!folio_ref_freeze(folio, expected_count)) {
> - xas_unlock_irq(&xas);
> - return -EAGAIN;
> - }
> -
> - /*
> - * Now we know that no one else is looking at the folio:
> - * no turning back from here.
> - */
> + /* Now we know that no one else is looking at the folio */
> newfolio->index = folio->index;
> newfolio->mapping = folio->mapping;
> folio_ref_add(newfolio, nr); /* add cache reference */
> @@ -452,7 +438,7 @@ int folio_migrate_mapping(struct address_space *mapping,
> entries = 1;
> }
>
> - /* Move dirty while page refs frozen and newpage not yet exposed */
> + /* Move dirty while folio refs frozen and newfolio not yet exposed */
> dirty = folio_test_dirty(folio);
> if (dirty) {
> folio_clear_dirty(folio);
> @@ -466,22 +452,22 @@ int folio_migrate_mapping(struct address_space *mapping,
> }
>
> /*
> - * Drop cache reference from old page by unfreezing
> - * to one less reference.
> + * Since old folio's refcount freezed, now drop cache reference from
> + * old folio by unfreezing to one less reference.
> * We know this isn't the last reference.
> */
> - folio_ref_unfreeze(folio, expected_count - nr);
> + folio_ref_unfreeze(folio, expected_cnt - nr);
>
> xas_unlock(&xas);
> /* Leave irq disabled to prevent preemption while updating stats */
>
> /*
> * If moved to a different zone then also account
> - * the page for that zone. Other VM counters will be
> + * the folio for that zone. Other VM counters will be
> * taken care of when we establish references to the
> - * new page and drop references to the old page.
> + * new folio and drop references to the old folio.
> *
> - * Note that anonymous pages are accounted for
> + * Note that anonymous folios are accounted for
> * via NR_FILE_PAGES and NR_ANON_MAPPED if they
> * are mapped to swap space.
> */
> @@ -518,7 +504,22 @@ int folio_migrate_mapping(struct address_space *mapping,
> }
> }
> local_irq_enable();
> +}
> +
> +int folio_migrate_mapping(struct address_space *mapping, struct folio *newfolio,
> + struct folio *folio, int extra_count)
> +{
> + int expected_cnt = folio_expected_refs(mapping, folio) + extra_count;
> +
> + if (!mapping) {
> + if (folio_ref_count(folio) != expected_cnt)
> + return -EAGAIN;
> + } else {
> + if (!folio_ref_freeze(folio, expected_cnt))
> + return -EAGAIN;
> + }
>
> + __folio_migrate_mapping(mapping, newfolio, folio, expected_cnt);
> return MIGRATEPAGE_SUCCESS;
> }
> EXPORT_SYMBOL(folio_migrate_mapping);
As far as I can tell, there is no functionality change since the file
based folio has already been isolated.
Reviewed-by: Jane Chu <jane.chu@oracle.com>
thanks,
-jane
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v4 4/6] mm: migrate: support poisoned recover from migrate folio
2024-06-03 9:24 ` [PATCH v4 4/6] mm: migrate: support poisoned recover from migrate folio Kefeng Wang
@ 2024-06-06 21:27 ` Jane Chu
2024-06-06 22:28 ` Jane Chu
0 siblings, 1 reply; 25+ messages in thread
From: Jane Chu @ 2024-06-06 21:27 UTC (permalink / raw)
To: Kefeng Wang, akpm, linux-mm
On 6/3/2024 2:24 AM, Kefeng Wang wrote:
> The folio migration is widely used in kernel, memory compaction, memory
> hotplug, soft offline page, numa balance, memory demote/promotion, etc,
> but once access a poisoned source folio when migrating, the kerenl will
> panic.
>
> There is a mechanism in the kernel to recover from uncorrectable memory
> errors, ARCH_HAS_COPY_MC, which is already used in other core-mm paths,
> eg, CoW, khugepaged, coredump, ksm copy, see copy_mc_to_{user,kernel},
> copy_mc_{user_}highpage callers.
>
> In order to support poisoned folio copy recover from migrate folio, we
> chose to make folio migration tolerant of memory failures and return
> error for folio migration, because folio migration is no guarantee
> of success, this could avoid the similar panic shown below.
>
> CPU: 1 PID: 88343 Comm: test_softofflin Kdump: loaded Not tainted 6.6.0
> pc : copy_page+0x10/0xc0
> lr : copy_highpage+0x38/0x50
I'm curious at how you manage to test this case . I mean, you trigger a
soft_offline,
and a source page with UE was being migrated, next, folio_copy()
triggers an MCE and
system panic. Did you use a bad dimm?
> ...
> Call trace:
> copy_page+0x10/0xc0
> folio_copy+0x78/0x90
> migrate_folio_extra+0x54/0xa0
> move_to_new_folio+0xd8/0x1f0
> migrate_folio_move+0xb8/0x300
> migrate_pages_batch+0x528/0x788
> migrate_pages_sync+0x8c/0x258
> migrate_pages+0x440/0x528
> soft_offline_in_use_page+0x2ec/0x3c0
> soft_offline_page+0x238/0x310
> soft_offline_page_store+0x6c/0xc0
> dev_attr_store+0x20/0x40
> sysfs_kf_write+0x4c/0x68
> kernfs_fop_write_iter+0x130/0x1c8
> new_sync_write+0xa4/0x138
> vfs_write+0x238/0x2d8
> ksys_write+0x74/0x110
>
> Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
> ---
> mm/migrate.c | 23 ++++++++++++++++++-----
> 1 file changed, 18 insertions(+), 5 deletions(-)
>
> diff --git a/mm/migrate.c b/mm/migrate.c
> index e930376c261a..28aa9da95781 100644
> --- a/mm/migrate.c
> +++ b/mm/migrate.c
> @@ -663,16 +663,29 @@ static int __migrate_folio(struct address_space *mapping, struct folio *dst,
> struct folio *src, void *src_private,
> enum migrate_mode mode)
> {
> - int rc;
> + int ret, expected_cnt = folio_expected_refs(mapping, src);
>
> - rc = folio_migrate_mapping(mapping, dst, src, 0);
> - if (rc != MIGRATEPAGE_SUCCESS)
> - return rc;
> + if (!mapping) {
> + if (folio_ref_count(src) != expected_cnt)
> + return -EAGAIN;
> + } else {
> + if (!folio_ref_freeze(src, expected_cnt))
> + return -EAGAIN;
> + }
> +
Let me take a guess, the reason you split up folio_migrate_copy() is that
folio_mc_copy() should be done before the 'src' folio's ->flags is
changed, right?
Is there any other reason? Could you add a comment please?
> + ret = folio_mc_copy(dst, src);
> + if (unlikely(ret)) {
> + if (mapping)
> + folio_ref_unfreeze(src, expected_cnt);
> + return ret;
> + }
> +
> + __folio_migrate_mapping(mapping, dst, src, expected_cnt);
>
> if (src_private)
> folio_attach_private(dst, folio_detach_private(src));
>
> - folio_migrate_copy(dst, src);
> + folio_migrate_flags(dst, src);
> return MIGRATEPAGE_SUCCESS;
> }
>
thanks,
-jane
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v4 4/6] mm: migrate: support poisoned recover from migrate folio
2024-06-06 21:27 ` Jane Chu
@ 2024-06-06 22:28 ` Jane Chu
2024-06-06 22:31 ` Jane Chu
0 siblings, 1 reply; 25+ messages in thread
From: Jane Chu @ 2024-06-06 22:28 UTC (permalink / raw)
To: Kefeng Wang, akpm, linux-mm
On 6/6/2024 2:27 PM, Jane Chu wrote:
> On 6/3/2024 2:24 AM, Kefeng Wang wrote:
>> diff --git a/mm/migrate.c b/mm/migrate.c
>> index e930376c261a..28aa9da95781 100644
>> --- a/mm/migrate.c
>> +++ b/mm/migrate.c
>> @@ -663,16 +663,29 @@ static int __migrate_folio(struct address_space
>> *mapping, struct folio *dst,
>> struct folio *src, void *src_private,
>> enum migrate_mode mode)
>> {
>> - int rc;
>> + int ret, expected_cnt = folio_expected_refs(mapping, src);
>> - rc = folio_migrate_mapping(mapping, dst, src, 0);
>> - if (rc != MIGRATEPAGE_SUCCESS)
>> - return rc;
>> + if (!mapping) {
>> + if (folio_ref_count(src) != expected_cnt)
>> + return -EAGAIN;
>> + } else {
>> + if (!folio_ref_freeze(src, expected_cnt))
>> + return -EAGAIN;
>> + }
>> +
>
> Let me take a guess, the reason you split up folio_migrate_copy() is that
>
> folio_mc_copy() should be done before the 'src' folio's ->flags is
> changed, right?
>
> Is there any other reason? Could you add a comment please?
I see, both the clearing of the 'dirty' bit in the source folio, and the
xas_store of the
new folio to the mapping, these need to be done after folio_mc_copy
considering in the
event of UE, memory_failure() is called to handle the poison in the
source page.
That said, since the poisoned page was queued up and handling is
asynchronous, so in
theory, there is an extremely unlikely chance that memory_failure() is
invoked after
folio_migrate_mapping(), do you think things would still be cool?
thanks,
-jane
>
>> + ret = folio_mc_copy(dst, src);
>> + if (unlikely(ret)) {
>> + if (mapping)
>> + folio_ref_unfreeze(src, expected_cnt);
>> + return ret;
>> + }
>> +
>> + __folio_migrate_mapping(mapping, dst, src, expected_cnt);
>> if (src_private)
>> folio_attach_private(dst, folio_detach_private(src));
>> - folio_migrate_copy(dst, src);
>> + folio_migrate_flags(dst, src);
>> return MIGRATEPAGE_SUCCESS;
>> }
>
> thanks,
>
> -jane
>
>
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v4 4/6] mm: migrate: support poisoned recover from migrate folio
2024-06-06 22:28 ` Jane Chu
@ 2024-06-06 22:31 ` Jane Chu
2024-06-07 4:01 ` Kefeng Wang
0 siblings, 1 reply; 25+ messages in thread
From: Jane Chu @ 2024-06-06 22:31 UTC (permalink / raw)
To: Kefeng Wang, akpm, linux-mm
On 6/6/2024 3:28 PM, Jane Chu wrote:
> On 6/6/2024 2:27 PM, Jane Chu wrote:
>
>> On 6/3/2024 2:24 AM, Kefeng Wang wrote:
>>> diff --git a/mm/migrate.c b/mm/migrate.c
>>> index e930376c261a..28aa9da95781 100644
>>> --- a/mm/migrate.c
>>> +++ b/mm/migrate.c
>>> @@ -663,16 +663,29 @@ static int __migrate_folio(struct
>>> address_space *mapping, struct folio *dst,
>>> struct folio *src, void *src_private,
>>> enum migrate_mode mode)
>>> {
>>> - int rc;
>>> + int ret, expected_cnt = folio_expected_refs(mapping, src);
>>> - rc = folio_migrate_mapping(mapping, dst, src, 0);
>>> - if (rc != MIGRATEPAGE_SUCCESS)
>>> - return rc;
>>> + if (!mapping) {
>>> + if (folio_ref_count(src) != expected_cnt)
>>> + return -EAGAIN;
>>> + } else {
>>> + if (!folio_ref_freeze(src, expected_cnt))
>>> + return -EAGAIN;
>>> + }
>>> +
>>
>> Let me take a guess, the reason you split up folio_migrate_copy() is
>> that
>>
>> folio_mc_copy() should be done before the 'src' folio's ->flags is
>> changed, right?
>>
>> Is there any other reason? Could you add a comment please?
>
> I see, both the clearing of the 'dirty' bit in the source folio, and
> the xas_store of the
>
> new folio to the mapping, these need to be done after folio_mc_copy
> considering in the
>
> event of UE, memory_failure() is called to handle the poison in the
> source page.
>
> That said, since the poisoned page was queued up and handling is
> asynchronous, so in
>
> theory, there is an extremely unlikely chance that memory_failure() is
> invoked after
>
> folio_migrate_mapping(), do you think things would still be cool?
Hmm, perhaps after xas_store, the source folio->mapping should be set to
NULL.
thanks,
-jane
>
>
> thanks,
>
> -jane
>
>>
>>> + ret = folio_mc_copy(dst, src);
>>> + if (unlikely(ret)) {
>>> + if (mapping)
>>> + folio_ref_unfreeze(src, expected_cnt);
>>> + return ret;
>>> + }
>>> +
>>> + __folio_migrate_mapping(mapping, dst, src, expected_cnt);
>>> if (src_private)
>>> folio_attach_private(dst, folio_detach_private(src));
>>> - folio_migrate_copy(dst, src);
>>> + folio_migrate_flags(dst, src);
>>> return MIGRATEPAGE_SUCCESS;
>>> }
>>
>> thanks,
>>
>> -jane
>>
>>
>
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v4 5/6] fs: hugetlbfs: support poison recover from hugetlbfs_migrate_folio()
2024-06-03 9:24 ` [PATCH v4 5/6] fs: hugetlbfs: support poison recover from hugetlbfs_migrate_folio() Kefeng Wang
@ 2024-06-06 23:30 ` Jane Chu
0 siblings, 0 replies; 25+ messages in thread
From: Jane Chu @ 2024-06-06 23:30 UTC (permalink / raw)
To: Kefeng Wang, akpm, linux-mm
Cc: Tony Luck, Miaohe Lin, nao.horiguchi, Matthew Wilcox,
David Hildenbrand, Muchun Song, Benjamin LaHaise, jglisse,
Jiaqi Yan, Hugh Dickins, Vishal Moola, Alistair Popple,
Oscar Salvador, Lance Yang
On 6/3/2024 2:24 AM, Kefeng Wang wrote:
> This is similar to __migrate_folio(), use folio_mc_copy() in HugeTLB
> folio migration to avoid panic when copy from poisoned folio.
>
> Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
> ---
> fs/hugetlbfs/inode.c | 2 +-
> mm/migrate.c | 14 +++++++++-----
> 2 files changed, 10 insertions(+), 6 deletions(-)
>
> diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
> index 6df794ed4066..1107e5aa8343 100644
> --- a/fs/hugetlbfs/inode.c
> +++ b/fs/hugetlbfs/inode.c
> @@ -1128,7 +1128,7 @@ static int hugetlbfs_migrate_folio(struct address_space *mapping,
> hugetlb_set_folio_subpool(src, NULL);
> }
>
> - folio_migrate_copy(dst, src);
> + folio_migrate_flags(dst, src);
>
> return MIGRATEPAGE_SUCCESS;
> }
> diff --git a/mm/migrate.c b/mm/migrate.c
> index 28aa9da95781..e9b52a86f539 100644
> --- a/mm/migrate.c
> +++ b/mm/migrate.c
> @@ -532,15 +532,19 @@ int migrate_huge_page_move_mapping(struct address_space *mapping,
> struct folio *dst, struct folio *src)
> {
> XA_STATE(xas, &mapping->i_pages, folio_index(src));
> - int expected_count;
> + int ret, expected_count = folio_expected_refs(mapping, src);
>
> - xas_lock_irq(&xas);
> - expected_count = folio_expected_refs(mapping, src);
> - if (!folio_ref_freeze(src, expected_count)) {
> - xas_unlock_irq(&xas);
> + if (!folio_ref_freeze(src, expected_count))
> return -EAGAIN;
> +
> + ret = folio_mc_copy(dst, src);
> + if (unlikely(ret)) {
> + folio_ref_unfreeze(src, expected_count);
> + return ret;
> }
>
> + xas_lock_irq(&xas);
> +
> dst->index = src->index;
> dst->mapping = src->mapping;
>
Look good!
Reviewed-by: Jane Chu <jane.chu@oracle.com>
-jane
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v4 6/6] mm: migrate: remove folio_migrate_copy()
2024-06-03 9:24 ` [PATCH v4 6/6] mm: migrate: remove folio_migrate_copy() Kefeng Wang
@ 2024-06-06 23:46 ` Jane Chu
0 siblings, 0 replies; 25+ messages in thread
From: Jane Chu @ 2024-06-06 23:46 UTC (permalink / raw)
To: Kefeng Wang, akpm, linux-mm
Cc: Tony Luck, Miaohe Lin, nao.horiguchi, Matthew Wilcox,
David Hildenbrand, Muchun Song, Benjamin LaHaise, jglisse,
Jiaqi Yan, Hugh Dickins, Vishal Moola, Alistair Popple,
Oscar Salvador, Lance Yang
On 6/3/2024 2:24 AM, Kefeng Wang wrote:
> The folio_migrate_copy() is just a wrapper of folio_copy() and
> folio_migrate_flags(), it is simple and only aio use it for now,
> unfold it and remove folio_migrate_copy().
>
> Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
> ---
> fs/aio.c | 3 ++-
> include/linux/migrate.h | 1 -
> mm/migrate.c | 7 -------
> 3 files changed, 2 insertions(+), 9 deletions(-)
>
> diff --git a/fs/aio.c b/fs/aio.c
> index 07ff8bbdcd2a..bcee11fcb08b 100644
> --- a/fs/aio.c
> +++ b/fs/aio.c
> @@ -455,7 +455,8 @@ static int aio_migrate_folio(struct address_space *mapping, struct folio *dst,
> * events from being lost.
> */
> spin_lock_irqsave(&ctx->completion_lock, flags);
> - folio_migrate_copy(dst, src);
> + folio_copy(dst, src);
> + folio_migrate_flags(dst, src);
> BUG_ON(ctx->ring_folios[idx] != src);
> ctx->ring_folios[idx] = dst;
> spin_unlock_irqrestore(&ctx->completion_lock, flags);
> diff --git a/include/linux/migrate.h b/include/linux/migrate.h
> index 517f70b70620..f9d92482d117 100644
> --- a/include/linux/migrate.h
> +++ b/include/linux/migrate.h
> @@ -76,7 +76,6 @@ int migrate_huge_page_move_mapping(struct address_space *mapping,
> void migration_entry_wait_on_locked(swp_entry_t entry, spinlock_t *ptl)
> __releases(ptl);
> void folio_migrate_flags(struct folio *newfolio, struct folio *folio);
> -void folio_migrate_copy(struct folio *newfolio, struct folio *folio);
> int folio_migrate_mapping(struct address_space *mapping,
> struct folio *newfolio, struct folio *folio, int extra_count);
>
> diff --git a/mm/migrate.c b/mm/migrate.c
> index e9b52a86f539..fe6ea3fb896e 100644
> --- a/mm/migrate.c
> +++ b/mm/migrate.c
> @@ -652,13 +652,6 @@ void folio_migrate_flags(struct folio *newfolio, struct folio *folio)
> }
> EXPORT_SYMBOL(folio_migrate_flags);
>
> -void folio_migrate_copy(struct folio *newfolio, struct folio *folio)
> -{
> - folio_copy(newfolio, folio);
> - folio_migrate_flags(newfolio, folio);
> -}
> -EXPORT_SYMBOL(folio_migrate_copy);
> -
> /************************************************************
> * Migration functions
> ***********************************************************/
Looks good!
Reviewed-by: Jane Chu <jane.chu@oacle.com>
-jane
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v4 4/6] mm: migrate: support poisoned recover from migrate folio
2024-06-06 22:31 ` Jane Chu
@ 2024-06-07 4:01 ` Kefeng Wang
2024-06-07 15:59 ` Jane Chu
0 siblings, 1 reply; 25+ messages in thread
From: Kefeng Wang @ 2024-06-07 4:01 UTC (permalink / raw)
To: Jane Chu, akpm, linux-mm
On 2024/6/7 6:31, Jane Chu wrote:
>
> On 6/6/2024 3:28 PM, Jane Chu wrote:
>> On 6/6/2024 2:27 PM, Jane Chu wrote:
>>
>>> On 6/3/2024 2:24 AM, Kefeng Wang wrote:
>>>> diff --git a/mm/migrate.c b/mm/migrate.c
>>>> index e930376c261a..28aa9da95781 100644
>>>> --- a/mm/migrate.c
>>>> +++ b/mm/migrate.c
>>>> @@ -663,16 +663,29 @@ static int __migrate_folio(struct
>>>> address_space *mapping, struct folio *dst,
>>>> struct folio *src, void *src_private,
>>>> enum migrate_mode mode)
>>>> {
>>>> - int rc;
>>>> + int ret, expected_cnt = folio_expected_refs(mapping, src);
>>>> - rc = folio_migrate_mapping(mapping, dst, src, 0);
>>>> - if (rc != MIGRATEPAGE_SUCCESS)
>>>> - return rc;
>>>> + if (!mapping) {
>>>> + if (folio_ref_count(src) != expected_cnt)
>>>> + return -EAGAIN;
>>>> + } else {
>>>> + if (!folio_ref_freeze(src, expected_cnt))
>>>> + return -EAGAIN;
>>>> + }
>>>> +
>>>
>>> Let me take a guess, the reason you split up folio_migrate_copy() is
>>> that
>>>
>>> folio_mc_copy() should be done before the 'src' folio's ->flags is
>>> changed, right?
>>>
>>> Is there any other reason? Could you add a comment please?
>>
>> I see, both the clearing of the 'dirty' bit in the source folio, and
>> the xas_store of the
>>
>> new folio to the mapping, these need to be done after folio_mc_copy
>> considering in the
Yes, many metadata are changed, and also some statistic(lruvec_state),
so we have to move folio_copy() ahead.
>>
>> event of UE, memory_failure() is called to handle the poison in the
>> source page.
>>
>> That said, since the poisoned page was queued up and handling is
>> asynchronous, so in
>>
>> theory, there is an extremely unlikely chance that memory_failure() is
>> invoked after
>>
>> folio_migrate_mapping(), do you think things would still be cool?
>
> Hmm, perhaps after xas_store, the source folio->mapping should be set to
> NULL.
When the folio_mc_copy() return -EHWPOISON, we never call
folio_migrate_mapping(), the source folio is not changed, so
it should be safe to handle the source folio by a asynchronous
memory_failure(), maybe I'm missing something?
PS: we test it via error injection to dimm and then soft offline memory.
Thanks.
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v4 4/6] mm: migrate: support poisoned recover from migrate folio
2024-06-07 4:01 ` Kefeng Wang
@ 2024-06-07 15:59 ` Jane Chu
0 siblings, 0 replies; 25+ messages in thread
From: Jane Chu @ 2024-06-07 15:59 UTC (permalink / raw)
To: Kefeng Wang, akpm, linux-mm
On 6/6/2024 9:01 PM, Kefeng Wang wrote:
>
>
> On 2024/6/7 6:31, Jane Chu wrote:
>>
>> On 6/6/2024 3:28 PM, Jane Chu wrote:
>>> On 6/6/2024 2:27 PM, Jane Chu wrote:
>>>
>>>> On 6/3/2024 2:24 AM, Kefeng Wang wrote:
>>>>> diff --git a/mm/migrate.c b/mm/migrate.c
>>>>> index e930376c261a..28aa9da95781 100644
>>>>> --- a/mm/migrate.c
>>>>> +++ b/mm/migrate.c
>>>>> @@ -663,16 +663,29 @@ static int __migrate_folio(struct
>>>>> address_space *mapping, struct folio *dst,
>>>>> struct folio *src, void *src_private,
>>>>> enum migrate_mode mode)
>>>>> {
>>>>> - int rc;
>>>>> + int ret, expected_cnt = folio_expected_refs(mapping, src);
>>>>> - rc = folio_migrate_mapping(mapping, dst, src, 0);
>>>>> - if (rc != MIGRATEPAGE_SUCCESS)
>>>>> - return rc;
>>>>> + if (!mapping) {
>>>>> + if (folio_ref_count(src) != expected_cnt)
>>>>> + return -EAGAIN;
>>>>> + } else {
>>>>> + if (!folio_ref_freeze(src, expected_cnt))
>>>>> + return -EAGAIN;
>>>>> + }
>>>>> +
>>>>
>>>> Let me take a guess, the reason you split up folio_migrate_copy()
>>>> is that
>>>>
>>>> folio_mc_copy() should be done before the 'src' folio's ->flags is
>>>> changed, right?
>>>>
>>>> Is there any other reason? Could you add a comment please?
>>>
>>> I see, both the clearing of the 'dirty' bit in the source folio, and
>>> the xas_store of the
>>>
>>> new folio to the mapping, these need to be done after folio_mc_copy
>>> considering in the
>
> Yes, many metadata are changed, and also some statistic(lruvec_state),
> so we have to move folio_copy() ahead.
>
>
>>>
>>> event of UE, memory_failure() is called to handle the poison in the
>>> source page.
>>>
>>> That said, since the poisoned page was queued up and handling is
>>> asynchronous, so in
>>>
>>> theory, there is an extremely unlikely chance that memory_failure()
>>> is invoked after
>>>
>>> folio_migrate_mapping(), do you think things would still be cool?
>>
>> Hmm, perhaps after xas_store, the source folio->mapping should be set
>> to NULL.
>
> When the folio_mc_copy() return -EHWPOISON, we never call
> folio_migrate_mapping(), the source folio is not changed, so
> it should be safe to handle the source folio by a asynchronous
> memory_failure(),
Right, I omitted this part, thanks!
> maybe I'm missing something?
>
> PS: we test it via error injection to dimm and then soft offline memory.
Got it.
Reviewed-by: Jane Chu <jane.chu@oracle.com>
thanks,
-jane
>
> Thanks.
^ permalink raw reply [flat|nested] 25+ messages in thread
end of thread, other threads:[~2024-06-07 15:59 UTC | newest]
Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-06-03 9:24 [PATCH v4 0/6] mm: migrate: support poison recover from migrate folio Kefeng Wang
2024-06-03 9:24 ` [PATCH v4 1/6] mm: move memory_failure_queue() into copy_mc_[user]_highpage() Kefeng Wang
2024-06-04 19:38 ` Jane Chu
2024-06-06 2:28 ` Miaohe Lin
2024-06-03 9:24 ` [PATCH v4 2/6] mm: add folio_mc_copy() Kefeng Wang
2024-06-04 19:41 ` Jane Chu
2024-06-05 3:31 ` Andrew Morton
2024-06-05 7:15 ` Kefeng Wang
2024-06-06 2:36 ` Miaohe Lin
2024-06-03 9:24 ` [PATCH v4 3/6] mm: migrate: split folio_migrate_mapping() Kefeng Wang
2024-06-06 0:54 ` Jane Chu
2024-06-06 1:24 ` Kefeng Wang
2024-06-06 1:55 ` Matthew Wilcox
2024-06-06 2:24 ` Kefeng Wang
2024-06-06 18:28 ` Jane Chu
2024-06-03 9:24 ` [PATCH v4 4/6] mm: migrate: support poisoned recover from migrate folio Kefeng Wang
2024-06-06 21:27 ` Jane Chu
2024-06-06 22:28 ` Jane Chu
2024-06-06 22:31 ` Jane Chu
2024-06-07 4:01 ` Kefeng Wang
2024-06-07 15:59 ` Jane Chu
2024-06-03 9:24 ` [PATCH v4 5/6] fs: hugetlbfs: support poison recover from hugetlbfs_migrate_folio() Kefeng Wang
2024-06-06 23:30 ` Jane Chu
2024-06-03 9:24 ` [PATCH v4 6/6] mm: migrate: remove folio_migrate_copy() Kefeng Wang
2024-06-06 23:46 ` Jane Chu
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox