linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] mm/hugetlb: fix memory offline failure due to hwpoisoned file hugetlb
@ 2026-03-18  2:07 Jinjiang Tu
  2026-03-20  2:34 ` Miaohe Lin
  0 siblings, 1 reply; 6+ messages in thread
From: Jinjiang Tu @ 2026-03-18  2:07 UTC (permalink / raw)
  To: akpm, muchun.song, osalvador, david, linmiaohe, nao.horiguchi, linux-mm
  Cc: wangkefeng.wang, sunnanyong, tujinjiang

When a file hugetlb folio triggers UCE, me_huge_page() will keep the
hugetlb folio in pagcahe with refcount increased and PG_hwpoison set. Even
after the hugetlb file is deleted, the hugetlb folio is still leaked.

If we want to offline the memory block that the hwpoisoned hugetlb folio
belongs to, it fails in dissolve_free_hugetlb_folios() due to the
hwpoisoned hugetlb folio isn't free.

I can reproduce this issue with the following steps in qemu:
 1) echo offline >/sys/devices/system/memory/auto_online_blocks
 2) in qemu monitor:
       object_add memory-backend-ram,id=mem10,size=1G
       device_add pc-dimm,id=dimm1,memdev=mem10,node=2
 3) echo online_movable > /sys/devices/system/node/node2/memory136/state
 4) echo 5 > /sys/devices/system/node/node2/hugepages/hugepages-2048kB/nr_hugepages
 5) run ./hugetlb_file. This process will receive SIGBUS.
 6) remove the hugetlbfs file.
 7) echo offline > /sys/devices/system/node/node2/memory136/state

hugetlb_file.c:
  fd = open("/dev/hugepages/my_hugepage_file", O_CREAT | O_RDWR, 0755);
  fallocate(fd, 0, 0, HUGEPAGE_SIZE * 2);
  addr = mmap(NULL, HUGEPAGE_SIZE * 2, PROT_READ | PROT_WRITE,
		MAP_SHARED | MAP_HUGETLB, fd, 0);
  memset(addr, 0xaa, HUGEPAGE_SIZE * 2);
  madvise(addr, HUGEPAGE_SIZE, MADV_HWPOISON);

To fix it, when deleting hugetlb folio from pagecache, mark the hugetlb
folio temporary, and put the refcount increased by memory-failure. After
the hugetlb folio is deleted from pagecache, the refcount is decreased to
zero and the hugetlb folio is dissolved.

Signed-off-by: Jinjiang Tu <tujinjiang@huawei.com>
---
 fs/hugetlbfs/inode.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
index 3f70c47981de..6bebe2e67f3e 100644
--- a/fs/hugetlbfs/inode.c
+++ b/fs/hugetlbfs/inode.c
@@ -603,6 +603,11 @@ static void remove_inode_hugepages(struct inode *inode, loff_t lstart,
 							index, truncate_op))
 				freed++;
 
+			if (unlikely(folio_test_hwpoison(folio))) {
+				folio_set_hugetlb_temporary(folio);
+				folio_put(folio);
+			}
+
 			mutex_unlock(&hugetlb_fault_mutex_table[hash]);
 		}
 		folio_batch_release(&fbatch);
-- 
2.43.0



^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2026-03-20  3:16 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2026-03-18  2:07 [PATCH] mm/hugetlb: fix memory offline failure due to hwpoisoned file hugetlb Jinjiang Tu
2026-03-20  2:34 ` Miaohe Lin
2026-03-20  2:37   ` Miaohe Lin
2026-03-20  2:43   ` Jinjiang Tu
2026-03-20  3:03     ` Miaohe Lin
2026-03-20  3:15       ` Jinjiang Tu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox