linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v4 0/4] Improve hugetlbfs read on HWPOISON hugepages
@ 2023-07-13  0:18 Jiaqi Yan
  2023-07-13  0:18 ` [PATCH v4 1/4] mm/hwpoison: delete all entries before traversal in __folio_free_raw_hwp Jiaqi Yan
                   ` (3 more replies)
  0 siblings, 4 replies; 19+ messages in thread
From: Jiaqi Yan @ 2023-07-13  0:18 UTC (permalink / raw)
  To: linmiaohe, mike.kravetz, naoya.horiguchi
  Cc: akpm, songmuchun, shy828301, linux-mm, linux-kernel, jthoughton,
	Jiaqi Yan

Today when hardware memory is corrupted in a hugetlb hugepage,
kernel leaves the hugepage in pagecache [1]; otherwise future mmap or
read will suject to silent data corruption. This is implemented by
returning -EIO from hugetlb_read_iter immediately if the hugepage has
HWPOISON flag set.

Since memory_failure already tracks the raw HWPOISON subpages in a
hugepage, a natural improvement is possible: if userspace only asks for
healthy subpages in the pagecache, kernel can return these data.

This patchset implements this improvement. It consist of three parts.
The 1st commit exports the functionality to tell if a subpage inside a
hugetlb hugepage is a raw HWPOISON page. The 2nd commit teaches
hugetlbfs_read_iter to return as many healthy bytes as possible.
The 3rd commit properly tests this new feature.

[1] commit 8625147cafaa ("hugetlbfs: don't delete error page from pagecache")

Changelog

v3 => v4
* incorporate feedbacks from Matthew Wilcox <willy@infradead.org>,
  Miaohe Lin <linmiaohe@huawei.com>, Mike Kravetz
  <mike.kravetz@oracle.com> and Naoya Horiguchi
  <naoya.horiguchi@nec.com>.
* rename is_raw_hwp_subpage => is_raw_hwpoison_page_in_hugepage.
* instead of taking hugetlb_lock, is_raw_hwpoison_page_in_hugepage needs
  to hold mf_mutex.
* use llist_for_each_entry, instead of llist_for_each_entry_safe in
  is_raw_hwpoison_page_in_hugepage.
* is_raw_hwpoison_page_in_hugepage doesn't need the folio argument.
* no need to export struct raw_hwp_page to header file.
* v4 is based on commit fd3006d2d0e7 ("Sync mm-stable with v6.5-rc1.").

v2 => v3
* Update commit messages for future reader to know background and
  code details.
* v3 is based on commit 5bb367dca2b9 ("Merge branch 'master' into
  mm-stable").

v1 => v2
* incorporate feedbacks from both Mike Kravetz
  <mike.kravetz@oracle.com> and Naoya Horiguchi
  <naoya.horiguchi@nec.com>.
* __folio_free_raw_hwp deletes all entries in raw_hwp_list before it
  traverses and frees raw_hwp_page.
* find_raw_hwp_page => __is_raw_hwp_subpage and __is_raw_hwp_subpage
  only returns bool instead of a raw_hwp_page entry.
* is_raw_hwp_subpage holds hugetlb_lock while checking
  __is_raw_hwp_subpage.
* No need to do folio_lock in adjust_range_hwpoison.
* v2 is based on commit a6e79df92e4a ("mm/gup: disallow FOLL_LONGTERM
  GUP-fast writing to file-backed mappings").

Jiaqi Yan (4):
  mm/hwpoison: delete all entries before traversal in
    __folio_free_raw_hwp
  mm/hwpoison: check if a raw page in a hugetlb folio is raw HWPOISON
  hugetlbfs: improve read HWPOISON hugepage
  selftests/mm: add tests for HWPOISON hugetlbfs read

 fs/hugetlbfs/inode.c                          |  57 +++-
 include/linux/hugetlb.h                       |   5 +
 mm/memory-failure.c                           |  48 ++-
 tools/testing/selftests/mm/.gitignore         |   1 +
 tools/testing/selftests/mm/Makefile           |   1 +
 .../selftests/mm/hugetlb-read-hwpoison.c      | 322 ++++++++++++++++++
 6 files changed, 421 insertions(+), 13 deletions(-)
 create mode 100644 tools/testing/selftests/mm/hugetlb-read-hwpoison.c

-- 
2.41.0.255.g8b1d071c50-goog



^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH v4 1/4] mm/hwpoison: delete all entries before traversal in __folio_free_raw_hwp
  2023-07-13  0:18 [PATCH v4 0/4] Improve hugetlbfs read on HWPOISON hugepages Jiaqi Yan
@ 2023-07-13  0:18 ` Jiaqi Yan
  2023-07-13  0:18 ` [PATCH v4 2/4] mm/hwpoison: check if a raw page in a hugetlb folio is raw HWPOISON Jiaqi Yan
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 19+ messages in thread
From: Jiaqi Yan @ 2023-07-13  0:18 UTC (permalink / raw)
  To: linmiaohe, mike.kravetz, naoya.horiguchi
  Cc: akpm, songmuchun, shy828301, linux-mm, linux-kernel, jthoughton,
	Jiaqi Yan

Traversal on llist (e.g. llist_for_each_safe) is only safe AFTER entries
are deleted from the llist. Correct the way __folio_free_raw_hwp deletes
and frees raw_hwp_page entries in raw_hwp_list: first llist_del_all, then
kfree within llist_for_each_safe.

As of today, concurrent adding, deleting, and traversal on raw_hwp_list
from hugetlb.c and/or memory-failure.c are fine with each other. Note
this is guaranteed partly by the lock-free nature of llist, and partly
by holding hugetlb_lock and/or mf_mutex. For example, as llist_del_all
is lock-free with itself, folio_clear_hugetlb_hwpoison()s from
__update_and_free_hugetlb_folio and memory_failure won't need explicit
locking when freeing the raw_hwp_list. New code that manipulates
raw_hwp_list must be careful to ensure the concurrency correctness.

Acked-by: Mike Kravetz <mike.kravetz@oracle.com>
Acked-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
Signed-off-by: Jiaqi Yan <jiaqiyan@google.com>
---
 mm/memory-failure.c | 8 +++-----
 1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index e245191e6b04..a08677dcf953 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -1829,12 +1829,11 @@ static inline struct llist_head *raw_hwp_list_head(struct folio *folio)
 
 static unsigned long __folio_free_raw_hwp(struct folio *folio, bool move_flag)
 {
-	struct llist_head *head;
-	struct llist_node *t, *tnode;
+	struct llist_node *t, *tnode, *head;
 	unsigned long count = 0;
 
-	head = raw_hwp_list_head(folio);
-	llist_for_each_safe(tnode, t, head->first) {
+	head = llist_del_all(raw_hwp_list_head(folio));
+	llist_for_each_safe(tnode, t, head) {
 		struct raw_hwp_page *p = container_of(tnode, struct raw_hwp_page, node);
 
 		if (move_flag)
@@ -1844,7 +1843,6 @@ static unsigned long __folio_free_raw_hwp(struct folio *folio, bool move_flag)
 		kfree(p);
 		count++;
 	}
-	llist_del_all(head);
 	return count;
 }
 
-- 
2.41.0.255.g8b1d071c50-goog



^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH v4 2/4] mm/hwpoison: check if a raw page in a hugetlb folio is raw HWPOISON
  2023-07-13  0:18 [PATCH v4 0/4] Improve hugetlbfs read on HWPOISON hugepages Jiaqi Yan
  2023-07-13  0:18 ` [PATCH v4 1/4] mm/hwpoison: delete all entries before traversal in __folio_free_raw_hwp Jiaqi Yan
@ 2023-07-13  0:18 ` Jiaqi Yan
  2023-07-13  0:18 ` [PATCH v4 3/4] hugetlbfs: improve read HWPOISON hugepage Jiaqi Yan
  2023-07-13  0:18 ` [PATCH v4 4/4] selftests/mm: add tests for HWPOISON hugetlbfs read Jiaqi Yan
  3 siblings, 0 replies; 19+ messages in thread
From: Jiaqi Yan @ 2023-07-13  0:18 UTC (permalink / raw)
  To: linmiaohe, mike.kravetz, naoya.horiguchi
  Cc: akpm, songmuchun, shy828301, linux-mm, linux-kernel, jthoughton,
	Jiaqi Yan

Add the functionality, is_raw_hwpoison_page_in_hugepage, to tell if a
raw page in a hugetlb folio is HWPOISON. This functionality relies on
RawHwpUnreliable to be not set; otherwise hugepage's raw HWPOISON list
becomes meaningless.

is_raw_hwpoison_page_in_hugepage holds mf_mutex in order to synchronize
with folio_set_hugetlb_hwpoison and folio_free_raw_hwp who iterate,
insert, or delete entry in raw_hwp_list. llist itself doesn't ensure
insertion and removal are synchornized with the llist_for_each_entry
used by is_raw_hwpoison_page_in_hugepage (unless iterated entries are
already deleted from the list). Caller can minimize the overhead of
lock cycles by first checking HWPOISON flag of the folio.

Exports this functionality to be immediately used in the read operation
for hugetlbfs.

Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Reviewed-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
Reviewed-by: Miaohe Lin <linmiaohe@huawei.com>
Signed-off-by: Jiaqi Yan <jiaqiyan@google.com>
---
 include/linux/hugetlb.h |  5 +++++
 mm/memory-failure.c     | 40 ++++++++++++++++++++++++++++++++++++++--
 2 files changed, 43 insertions(+), 2 deletions(-)

diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index ca3c8e10f24a..0a96cfacb746 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -1007,6 +1007,11 @@ void hugetlb_register_node(struct node *node);
 void hugetlb_unregister_node(struct node *node);
 #endif
 
+/*
+ * Check if a given raw @page in a hugepage is HWPOISON.
+ */
+bool is_raw_hwpoison_page_in_hugepage(struct page *page);
+
 #else	/* CONFIG_HUGETLB_PAGE */
 struct hstate {};
 
diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index a08677dcf953..d610d8f03f69 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -75,6 +75,8 @@ atomic_long_t num_poisoned_pages __read_mostly = ATOMIC_LONG_INIT(0);
 
 static bool hw_memory_failure __read_mostly = false;
 
+static DEFINE_MUTEX(mf_mutex);
+
 inline void num_poisoned_pages_inc(unsigned long pfn)
 {
 	atomic_long_inc(&num_poisoned_pages);
@@ -1813,6 +1815,7 @@ EXPORT_SYMBOL_GPL(mf_dax_kill_procs);
 #endif /* CONFIG_FS_DAX */
 
 #ifdef CONFIG_HUGETLB_PAGE
+
 /*
  * Struct raw_hwp_page represents information about "raw error page",
  * constructing singly linked list from ->_hugetlb_hwpoison field of folio.
@@ -1827,6 +1830,41 @@ static inline struct llist_head *raw_hwp_list_head(struct folio *folio)
 	return (struct llist_head *)&folio->_hugetlb_hwpoison;
 }
 
+bool is_raw_hwpoison_page_in_hugepage(struct page *page)
+{
+	struct llist_head *raw_hwp_head;
+	struct raw_hwp_page *p;
+	struct folio *folio = page_folio(page);
+	bool ret = false;
+
+	if (!folio_test_hwpoison(folio))
+		return false;
+
+	if (!folio_test_hugetlb(folio))
+		return PageHWPoison(page);
+
+	/*
+	 * When RawHwpUnreliable is set, kernel lost track of which subpages
+	 * are HWPOISON. So return as if ALL subpages are HWPOISONed.
+	 */
+	if (folio_test_hugetlb_raw_hwp_unreliable(folio))
+		return true;
+
+	mutex_lock(&mf_mutex);
+
+	raw_hwp_head = raw_hwp_list_head(folio);
+	llist_for_each_entry(p, raw_hwp_head->first, node) {
+		if (page == p->page) {
+			ret = true;
+			break;
+		}
+	}
+
+	mutex_unlock(&mf_mutex);
+
+	return ret;
+}
+
 static unsigned long __folio_free_raw_hwp(struct folio *folio, bool move_flag)
 {
 	struct llist_node *t, *tnode, *head;
@@ -2106,8 +2144,6 @@ static int memory_failure_dev_pagemap(unsigned long pfn, int flags,
 	return rc;
 }
 
-static DEFINE_MUTEX(mf_mutex);
-
 /**
  * memory_failure - Handle memory failure of a page.
  * @pfn: Page Number of the corrupted page
-- 
2.41.0.255.g8b1d071c50-goog



^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH v4 3/4] hugetlbfs: improve read HWPOISON hugepage
  2023-07-13  0:18 [PATCH v4 0/4] Improve hugetlbfs read on HWPOISON hugepages Jiaqi Yan
  2023-07-13  0:18 ` [PATCH v4 1/4] mm/hwpoison: delete all entries before traversal in __folio_free_raw_hwp Jiaqi Yan
  2023-07-13  0:18 ` [PATCH v4 2/4] mm/hwpoison: check if a raw page in a hugetlb folio is raw HWPOISON Jiaqi Yan
@ 2023-07-13  0:18 ` Jiaqi Yan
  2023-07-13  0:18 ` [PATCH v4 4/4] selftests/mm: add tests for HWPOISON hugetlbfs read Jiaqi Yan
  3 siblings, 0 replies; 19+ messages in thread
From: Jiaqi Yan @ 2023-07-13  0:18 UTC (permalink / raw)
  To: linmiaohe, mike.kravetz, naoya.horiguchi
  Cc: akpm, songmuchun, shy828301, linux-mm, linux-kernel, jthoughton,
	Jiaqi Yan

When a hugepage contains HWPOISON pages, read() fails to read any byte
of the hugepage and returns -EIO, although many bytes in the HWPOISON
hugepage are readable.

Improve this by allowing hugetlbfs_read_iter returns as many bytes as
possible. For a requested range [offset, offset + len) that contains
HWPOISON page, return [offset, first HWPOISON page addr); the next read
attempt will fail and return -EIO.

Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Reviewed-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
Signed-off-by: Jiaqi Yan <jiaqiyan@google.com>
---
 fs/hugetlbfs/inode.c | 57 +++++++++++++++++++++++++++++++++++++++-----
 1 file changed, 51 insertions(+), 6 deletions(-)

diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
index 7b17ccfa039d..e7611ae1e612 100644
--- a/fs/hugetlbfs/inode.c
+++ b/fs/hugetlbfs/inode.c
@@ -282,6 +282,41 @@ hugetlb_get_unmapped_area(struct file *file, unsigned long addr,
 }
 #endif
 
+/*
+ * Someone wants to read @bytes from a HWPOISON hugetlb @page from @offset.
+ * Returns the maximum number of bytes one can read without touching the 1st raw
+ * HWPOISON subpage.
+ *
+ * The implementation borrows the iteration logic from copy_page_to_iter*.
+ */
+static size_t adjust_range_hwpoison(struct page *page, size_t offset, size_t bytes)
+{
+	size_t n = 0;
+	size_t res = 0;
+
+	/* First subpage to start the loop. */
+	page += offset / PAGE_SIZE;
+	offset %= PAGE_SIZE;
+	while (1) {
+		if (is_raw_hwpoison_page_in_hugepage(page))
+			break;
+
+		/* Safe to read n bytes without touching HWPOISON subpage. */
+		n = min(bytes, (size_t)PAGE_SIZE - offset);
+		res += n;
+		bytes -= n;
+		if (!bytes || !n)
+			break;
+		offset += n;
+		if (offset == PAGE_SIZE) {
+			page++;
+			offset = 0;
+		}
+	}
+
+	return res;
+}
+
 /*
  * Support for read() - Find the page attached to f_mapping and copy out the
  * data. This provides functionality similar to filemap_read().
@@ -300,7 +335,7 @@ static ssize_t hugetlbfs_read_iter(struct kiocb *iocb, struct iov_iter *to)
 
 	while (iov_iter_count(to)) {
 		struct page *page;
-		size_t nr, copied;
+		size_t nr, copied, want;
 
 		/* nr is the maximum number of bytes to copy from this page */
 		nr = huge_page_size(h);
@@ -328,16 +363,26 @@ static ssize_t hugetlbfs_read_iter(struct kiocb *iocb, struct iov_iter *to)
 		} else {
 			unlock_page(page);
 
-			if (PageHWPoison(page)) {
-				put_page(page);
-				retval = -EIO;
-				break;
+			if (!PageHWPoison(page))
+				want = nr;
+			else {
+				/*
+				 * Adjust how many bytes safe to read without
+				 * touching the 1st raw HWPOISON subpage after
+				 * offset.
+				 */
+				want = adjust_range_hwpoison(page, offset, nr);
+				if (want == 0) {
+					put_page(page);
+					retval = -EIO;
+					break;
+				}
 			}
 
 			/*
 			 * We have the page, copy it to user space buffer.
 			 */
-			copied = copy_page_to_iter(page, offset, nr, to);
+			copied = copy_page_to_iter(page, offset, want, to);
 			put_page(page);
 		}
 		offset += copied;
-- 
2.41.0.255.g8b1d071c50-goog



^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH v4 4/4] selftests/mm: add tests for HWPOISON hugetlbfs read
  2023-07-13  0:18 [PATCH v4 0/4] Improve hugetlbfs read on HWPOISON hugepages Jiaqi Yan
                   ` (2 preceding siblings ...)
  2023-07-13  0:18 ` [PATCH v4 3/4] hugetlbfs: improve read HWPOISON hugepage Jiaqi Yan
@ 2023-07-13  0:18 ` Jiaqi Yan
  2024-01-05  6:27   ` Muhammad Usama Anjum
  3 siblings, 1 reply; 19+ messages in thread
From: Jiaqi Yan @ 2023-07-13  0:18 UTC (permalink / raw)
  To: linmiaohe, mike.kravetz, naoya.horiguchi
  Cc: akpm, songmuchun, shy828301, linux-mm, linux-kernel, jthoughton,
	Jiaqi Yan

Add tests for the improvement made to read operation on HWPOISON
hugetlb page with different read granularities. For each chunk size,
three read scenarios are tested:
1. Simple regression test on read without HWPOISON.
2. Sequential read page by page should succeed until encounters the 1st
   raw HWPOISON subpage.
3. After skip a raw HWPOISON subpage by lseek, read()s always succeed.

Acked-by: Mike Kravetz <mike.kravetz@oracle.com>
Reviewed-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
Signed-off-by: Jiaqi Yan <jiaqiyan@google.com>
---
 tools/testing/selftests/mm/.gitignore         |   1 +
 tools/testing/selftests/mm/Makefile           |   1 +
 .../selftests/mm/hugetlb-read-hwpoison.c      | 322 ++++++++++++++++++
 3 files changed, 324 insertions(+)
 create mode 100644 tools/testing/selftests/mm/hugetlb-read-hwpoison.c

diff --git a/tools/testing/selftests/mm/.gitignore b/tools/testing/selftests/mm/.gitignore
index 7e2a982383c0..cdc9ce4426b9 100644
--- a/tools/testing/selftests/mm/.gitignore
+++ b/tools/testing/selftests/mm/.gitignore
@@ -5,6 +5,7 @@ hugepage-mremap
 hugepage-shm
 hugepage-vmemmap
 hugetlb-madvise
+hugetlb-read-hwpoison
 khugepaged
 map_hugetlb
 map_populate
diff --git a/tools/testing/selftests/mm/Makefile b/tools/testing/selftests/mm/Makefile
index 66d7c07dc177..b7fce9073279 100644
--- a/tools/testing/selftests/mm/Makefile
+++ b/tools/testing/selftests/mm/Makefile
@@ -41,6 +41,7 @@ TEST_GEN_PROGS += gup_longterm
 TEST_GEN_PROGS += gup_test
 TEST_GEN_PROGS += hmm-tests
 TEST_GEN_PROGS += hugetlb-madvise
+TEST_GEN_PROGS += hugetlb-read-hwpoison
 TEST_GEN_PROGS += hugepage-mmap
 TEST_GEN_PROGS += hugepage-mremap
 TEST_GEN_PROGS += hugepage-shm
diff --git a/tools/testing/selftests/mm/hugetlb-read-hwpoison.c b/tools/testing/selftests/mm/hugetlb-read-hwpoison.c
new file mode 100644
index 000000000000..ba6cc6f9cabc
--- /dev/null
+++ b/tools/testing/selftests/mm/hugetlb-read-hwpoison.c
@@ -0,0 +1,322 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#define _GNU_SOURCE
+#include <stdlib.h>
+#include <stdio.h>
+#include <string.h>
+
+#include <linux/magic.h>
+#include <sys/mman.h>
+#include <sys/statfs.h>
+#include <errno.h>
+#include <stdbool.h>
+
+#include "../kselftest.h"
+
+#define PREFIX " ... "
+#define ERROR_PREFIX " !!! "
+
+#define MAX_WRITE_READ_CHUNK_SIZE (getpagesize() * 16)
+#define MAX(a, b) (((a) > (b)) ? (a) : (b))
+
+enum test_status {
+	TEST_PASSED = 0,
+	TEST_FAILED = 1,
+	TEST_SKIPPED = 2,
+};
+
+static char *status_to_str(enum test_status status)
+{
+	switch (status) {
+	case TEST_PASSED:
+		return "TEST_PASSED";
+	case TEST_FAILED:
+		return "TEST_FAILED";
+	case TEST_SKIPPED:
+		return "TEST_SKIPPED";
+	default:
+		return "TEST_???";
+	}
+}
+
+static int setup_filemap(char *filemap, size_t len, size_t wr_chunk_size)
+{
+	char iter = 0;
+
+	for (size_t offset = 0; offset < len;
+	     offset += wr_chunk_size) {
+		iter++;
+		memset(filemap + offset, iter, wr_chunk_size);
+	}
+
+	return 0;
+}
+
+static bool verify_chunk(char *buf, size_t len, char val)
+{
+	size_t i;
+
+	for (i = 0; i < len; ++i) {
+		if (buf[i] != val) {
+			printf(PREFIX ERROR_PREFIX "check fail: buf[%lu] = %u != %u\n",
+				i, buf[i], val);
+			return false;
+		}
+	}
+
+	return true;
+}
+
+static bool seek_read_hugepage_filemap(int fd, size_t len, size_t wr_chunk_size,
+				       off_t offset, size_t expected)
+{
+	char buf[MAX_WRITE_READ_CHUNK_SIZE];
+	ssize_t ret_count = 0;
+	ssize_t total_ret_count = 0;
+	char val = offset / wr_chunk_size + offset % wr_chunk_size;
+
+	printf(PREFIX PREFIX "init val=%u with offset=0x%lx\n", val, offset);
+	printf(PREFIX PREFIX "expect to read 0x%lx bytes of data in total\n",
+	       expected);
+	if (lseek(fd, offset, SEEK_SET) < 0) {
+		perror(PREFIX ERROR_PREFIX "seek failed");
+		return false;
+	}
+
+	while (offset + total_ret_count < len) {
+		ret_count = read(fd, buf, wr_chunk_size);
+		if (ret_count == 0) {
+			printf(PREFIX PREFIX "read reach end of the file\n");
+			break;
+		} else if (ret_count < 0) {
+			perror(PREFIX ERROR_PREFIX "read failed");
+			break;
+		}
+		++val;
+		if (!verify_chunk(buf, ret_count, val))
+			return false;
+
+		total_ret_count += ret_count;
+	}
+	printf(PREFIX PREFIX "actually read 0x%lx bytes of data in total\n",
+	       total_ret_count);
+
+	return total_ret_count == expected;
+}
+
+static bool read_hugepage_filemap(int fd, size_t len,
+				  size_t wr_chunk_size, size_t expected)
+{
+	char buf[MAX_WRITE_READ_CHUNK_SIZE];
+	ssize_t ret_count = 0;
+	ssize_t total_ret_count = 0;
+	char val = 0;
+
+	printf(PREFIX PREFIX "expect to read 0x%lx bytes of data in total\n",
+	       expected);
+	while (total_ret_count < len) {
+		ret_count = read(fd, buf, wr_chunk_size);
+		if (ret_count == 0) {
+			printf(PREFIX PREFIX "read reach end of the file\n");
+			break;
+		} else if (ret_count < 0) {
+			perror(PREFIX ERROR_PREFIX "read failed");
+			break;
+		}
+		++val;
+		if (!verify_chunk(buf, ret_count, val))
+			return false;
+
+		total_ret_count += ret_count;
+	}
+	printf(PREFIX PREFIX "actually read 0x%lx bytes of data in total\n",
+	       total_ret_count);
+
+	return total_ret_count == expected;
+}
+
+static enum test_status
+test_hugetlb_read(int fd, size_t len, size_t wr_chunk_size)
+{
+	enum test_status status = TEST_SKIPPED;
+	char *filemap = NULL;
+
+	if (ftruncate(fd, len) < 0) {
+		perror(PREFIX ERROR_PREFIX "ftruncate failed");
+		return status;
+	}
+
+	filemap = mmap(NULL, len, PROT_READ | PROT_WRITE,
+		       MAP_SHARED | MAP_POPULATE, fd, 0);
+	if (filemap == MAP_FAILED) {
+		perror(PREFIX ERROR_PREFIX "mmap for primary mapping failed");
+		goto done;
+	}
+
+	setup_filemap(filemap, len, wr_chunk_size);
+	status = TEST_FAILED;
+
+	if (read_hugepage_filemap(fd, len, wr_chunk_size, len))
+		status = TEST_PASSED;
+
+	munmap(filemap, len);
+done:
+	if (ftruncate(fd, 0) < 0) {
+		perror(PREFIX ERROR_PREFIX "ftruncate back to 0 failed");
+		status = TEST_FAILED;
+	}
+
+	return status;
+}
+
+static enum test_status
+test_hugetlb_read_hwpoison(int fd, size_t len, size_t wr_chunk_size,
+			   bool skip_hwpoison_page)
+{
+	enum test_status status = TEST_SKIPPED;
+	char *filemap = NULL;
+	char *hwp_addr = NULL;
+	const unsigned long pagesize = getpagesize();
+
+	if (ftruncate(fd, len) < 0) {
+		perror(PREFIX ERROR_PREFIX "ftruncate failed");
+		return status;
+	}
+
+	filemap = mmap(NULL, len, PROT_READ | PROT_WRITE,
+		       MAP_SHARED | MAP_POPULATE, fd, 0);
+	if (filemap == MAP_FAILED) {
+		perror(PREFIX ERROR_PREFIX "mmap for primary mapping failed");
+		goto done;
+	}
+
+	setup_filemap(filemap, len, wr_chunk_size);
+	status = TEST_FAILED;
+
+	/*
+	 * Poisoned hugetlb page layout (assume hugepagesize=2MB):
+	 * |<---------------------- 1MB ---------------------->|
+	 * |<---- healthy page ---->|<---- HWPOISON page ----->|
+	 * |<------------------- (1MB - 8KB) ----------------->|
+	 */
+	hwp_addr = filemap + len / 2 + pagesize;
+	if (madvise(hwp_addr, pagesize, MADV_HWPOISON) < 0) {
+		perror(PREFIX ERROR_PREFIX "MADV_HWPOISON failed");
+		goto unmap;
+	}
+
+	if (!skip_hwpoison_page) {
+		/*
+		 * Userspace should be able to read (1MB + 1 page) from
+		 * the beginning of the HWPOISONed hugepage.
+		 */
+		if (read_hugepage_filemap(fd, len, wr_chunk_size,
+					  len / 2 + pagesize))
+			status = TEST_PASSED;
+	} else {
+		/*
+		 * Userspace should be able to read (1MB - 2 pages) from
+		 * HWPOISONed hugepage.
+		 */
+		if (seek_read_hugepage_filemap(fd, len, wr_chunk_size,
+					       len / 2 + MAX(2 * pagesize, wr_chunk_size),
+					       len / 2 - MAX(2 * pagesize, wr_chunk_size)))
+			status = TEST_PASSED;
+	}
+
+unmap:
+	munmap(filemap, len);
+done:
+	if (ftruncate(fd, 0) < 0) {
+		perror(PREFIX ERROR_PREFIX "ftruncate back to 0 failed");
+		status = TEST_FAILED;
+	}
+
+	return status;
+}
+
+static int create_hugetlbfs_file(struct statfs *file_stat)
+{
+	int fd;
+
+	fd = memfd_create("hugetlb_tmp", MFD_HUGETLB);
+	if (fd < 0) {
+		perror(PREFIX ERROR_PREFIX "could not open hugetlbfs file");
+		return -1;
+	}
+
+	memset(file_stat, 0, sizeof(*file_stat));
+	if (fstatfs(fd, file_stat)) {
+		perror(PREFIX ERROR_PREFIX "fstatfs failed");
+		goto close;
+	}
+	if (file_stat->f_type != HUGETLBFS_MAGIC) {
+		printf(PREFIX ERROR_PREFIX "not hugetlbfs file\n");
+		goto close;
+	}
+
+	return fd;
+close:
+	close(fd);
+	return -1;
+}
+
+int main(void)
+{
+	int fd;
+	struct statfs file_stat;
+	enum test_status status;
+	/* Test read() in different granularity. */
+	size_t wr_chunk_sizes[] = {
+		getpagesize() / 2, getpagesize(),
+		getpagesize() * 2, getpagesize() * 4
+	};
+	size_t i;
+
+	for (i = 0; i < ARRAY_SIZE(wr_chunk_sizes); ++i) {
+		printf("Write/read chunk size=0x%lx\n",
+		       wr_chunk_sizes[i]);
+
+		fd = create_hugetlbfs_file(&file_stat);
+		if (fd < 0)
+			goto create_failure;
+		printf(PREFIX "HugeTLB read regression test...\n");
+		status = test_hugetlb_read(fd, file_stat.f_bsize,
+					   wr_chunk_sizes[i]);
+		printf(PREFIX "HugeTLB read regression test...%s\n",
+		       status_to_str(status));
+		close(fd);
+		if (status == TEST_FAILED)
+			return -1;
+
+		fd = create_hugetlbfs_file(&file_stat);
+		if (fd < 0)
+			goto create_failure;
+		printf(PREFIX "HugeTLB read HWPOISON test...\n");
+		status = test_hugetlb_read_hwpoison(fd, file_stat.f_bsize,
+						    wr_chunk_sizes[i], false);
+		printf(PREFIX "HugeTLB read HWPOISON test...%s\n",
+		       status_to_str(status));
+		close(fd);
+		if (status == TEST_FAILED)
+			return -1;
+
+		fd = create_hugetlbfs_file(&file_stat);
+		if (fd < 0)
+			goto create_failure;
+		printf(PREFIX "HugeTLB seek then read HWPOISON test...\n");
+		status = test_hugetlb_read_hwpoison(fd, file_stat.f_bsize,
+						    wr_chunk_sizes[i], true);
+		printf(PREFIX "HugeTLB seek then read HWPOISON test...%s\n",
+		       status_to_str(status));
+		close(fd);
+		if (status == TEST_FAILED)
+			return -1;
+	}
+
+	return 0;
+
+create_failure:
+	printf(ERROR_PREFIX "Abort test: failed to create hugetlbfs file\n");
+	return -1;
+}
-- 
2.41.0.255.g8b1d071c50-goog



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v4 4/4] selftests/mm: add tests for HWPOISON hugetlbfs read
  2023-07-13  0:18 ` [PATCH v4 4/4] selftests/mm: add tests for HWPOISON hugetlbfs read Jiaqi Yan
@ 2024-01-05  6:27   ` Muhammad Usama Anjum
  2024-01-05 21:13     ` Jiaqi Yan
  0 siblings, 1 reply; 19+ messages in thread
From: Muhammad Usama Anjum @ 2024-01-05  6:27 UTC (permalink / raw)
  To: Jiaqi Yan, linmiaohe, mike.kravetz, naoya.horiguchi
  Cc: Muhammad Usama Anjum, akpm, songmuchun, shy828301, linux-mm,
	linux-kernel, jthoughton, kernel

Hi,

I'm trying to convert this test to TAP as I think the failures sometimes go
unnoticed on CI systems if we only depend on the return value of the
application. I've enabled the following configurations which aren't already
present in tools/testing/selftests/mm/config:
CONFIG_MEMORY_FAILURE=y
CONFIG_HWPOISON_INJECT=m

I'll send a patch to add these configs later. Right now I'm trying to
investigate the failure when we are trying to inject the poison page by
madvise(MADV_HWPOISON). I'm getting device busy every single time. The test
fails as it doesn't expect any business for the hugetlb memory. I'm not
sure if the poison handling code has issues or test isn't robust enough.

./hugetlb-read-hwpoison
Write/read chunk size=0x800
 ... HugeTLB read regression test...
 ...  ... expect to read 0x200000 bytes of data in total
 ...  ... actually read 0x200000 bytes of data in total
 ... HugeTLB read regression test...TEST_PASSED
 ... HugeTLB read HWPOISON test...
[    9.280854] Injecting memory failure for pfn 0x102f01 at process virtual
address 0x7f28ec101000
[    9.282029] Memory failure: 0x102f01: huge page still referenced by 511
users
[    9.282987] Memory failure: 0x102f01: recovery action for huge page: Failed
 ...  !!! MADV_HWPOISON failed: Device or resource busy
 ... HugeTLB read HWPOISON test...TEST_FAILED

I'm testing on v6.7-rc8. Not sure if this was working previously or not.

Regards,
Usama

On 7/13/23 5:18 AM, Jiaqi Yan wrote:
> Add tests for the improvement made to read operation on HWPOISON
> hugetlb page with different read granularities. For each chunk size,
> three read scenarios are tested:
> 1. Simple regression test on read without HWPOISON.
> 2. Sequential read page by page should succeed until encounters the 1st
>    raw HWPOISON subpage.
> 3. After skip a raw HWPOISON subpage by lseek, read()s always succeed.
> 
> Acked-by: Mike Kravetz <mike.kravetz@oracle.com>
> Reviewed-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
> Signed-off-by: Jiaqi Yan <jiaqiyan@google.com>
> ---
>  tools/testing/selftests/mm/.gitignore         |   1 +
>  tools/testing/selftests/mm/Makefile           |   1 +
>  .../selftests/mm/hugetlb-read-hwpoison.c      | 322 ++++++++++++++++++
>  3 files changed, 324 insertions(+)
>  create mode 100644 tools/testing/selftests/mm/hugetlb-read-hwpoison.c
> 
> diff --git a/tools/testing/selftests/mm/.gitignore b/tools/testing/selftests/mm/.gitignore
> index 7e2a982383c0..cdc9ce4426b9 100644
> --- a/tools/testing/selftests/mm/.gitignore
> +++ b/tools/testing/selftests/mm/.gitignore
> @@ -5,6 +5,7 @@ hugepage-mremap
>  hugepage-shm
>  hugepage-vmemmap
>  hugetlb-madvise
> +hugetlb-read-hwpoison
>  khugepaged
>  map_hugetlb
>  map_populate
> diff --git a/tools/testing/selftests/mm/Makefile b/tools/testing/selftests/mm/Makefile
> index 66d7c07dc177..b7fce9073279 100644
> --- a/tools/testing/selftests/mm/Makefile
> +++ b/tools/testing/selftests/mm/Makefile
> @@ -41,6 +41,7 @@ TEST_GEN_PROGS += gup_longterm
>  TEST_GEN_PROGS += gup_test
>  TEST_GEN_PROGS += hmm-tests
>  TEST_GEN_PROGS += hugetlb-madvise
> +TEST_GEN_PROGS += hugetlb-read-hwpoison
>  TEST_GEN_PROGS += hugepage-mmap
>  TEST_GEN_PROGS += hugepage-mremap
>  TEST_GEN_PROGS += hugepage-shm
> diff --git a/tools/testing/selftests/mm/hugetlb-read-hwpoison.c b/tools/testing/selftests/mm/hugetlb-read-hwpoison.c
> new file mode 100644
> index 000000000000..ba6cc6f9cabc
> --- /dev/null
> +++ b/tools/testing/selftests/mm/hugetlb-read-hwpoison.c
> @@ -0,0 +1,322 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +#define _GNU_SOURCE
> +#include <stdlib.h>
> +#include <stdio.h>
> +#include <string.h>
> +
> +#include <linux/magic.h>
> +#include <sys/mman.h>
> +#include <sys/statfs.h>
> +#include <errno.h>
> +#include <stdbool.h>
> +
> +#include "../kselftest.h"
> +
> +#define PREFIX " ... "
> +#define ERROR_PREFIX " !!! "
> +
> +#define MAX_WRITE_READ_CHUNK_SIZE (getpagesize() * 16)
> +#define MAX(a, b) (((a) > (b)) ? (a) : (b))
> +
> +enum test_status {
> +	TEST_PASSED = 0,
> +	TEST_FAILED = 1,
> +	TEST_SKIPPED = 2,
> +};
> +
> +static char *status_to_str(enum test_status status)
> +{
> +	switch (status) {
> +	case TEST_PASSED:
> +		return "TEST_PASSED";
> +	case TEST_FAILED:
> +		return "TEST_FAILED";
> +	case TEST_SKIPPED:
> +		return "TEST_SKIPPED";
> +	default:
> +		return "TEST_???";
> +	}
> +}
> +
> +static int setup_filemap(char *filemap, size_t len, size_t wr_chunk_size)
> +{
> +	char iter = 0;
> +
> +	for (size_t offset = 0; offset < len;
> +	     offset += wr_chunk_size) {
> +		iter++;
> +		memset(filemap + offset, iter, wr_chunk_size);
> +	}
> +
> +	return 0;
> +}
> +
> +static bool verify_chunk(char *buf, size_t len, char val)
> +{
> +	size_t i;
> +
> +	for (i = 0; i < len; ++i) {
> +		if (buf[i] != val) {
> +			printf(PREFIX ERROR_PREFIX "check fail: buf[%lu] = %u != %u\n",
> +				i, buf[i], val);
> +			return false;
> +		}
> +	}
> +
> +	return true;
> +}
> +
> +static bool seek_read_hugepage_filemap(int fd, size_t len, size_t wr_chunk_size,
> +				       off_t offset, size_t expected)
> +{
> +	char buf[MAX_WRITE_READ_CHUNK_SIZE];
> +	ssize_t ret_count = 0;
> +	ssize_t total_ret_count = 0;
> +	char val = offset / wr_chunk_size + offset % wr_chunk_size;
> +
> +	printf(PREFIX PREFIX "init val=%u with offset=0x%lx\n", val, offset);
> +	printf(PREFIX PREFIX "expect to read 0x%lx bytes of data in total\n",
> +	       expected);
> +	if (lseek(fd, offset, SEEK_SET) < 0) {
> +		perror(PREFIX ERROR_PREFIX "seek failed");
> +		return false;
> +	}
> +
> +	while (offset + total_ret_count < len) {
> +		ret_count = read(fd, buf, wr_chunk_size);
> +		if (ret_count == 0) {
> +			printf(PREFIX PREFIX "read reach end of the file\n");
> +			break;
> +		} else if (ret_count < 0) {
> +			perror(PREFIX ERROR_PREFIX "read failed");
> +			break;
> +		}
> +		++val;
> +		if (!verify_chunk(buf, ret_count, val))
> +			return false;
> +
> +		total_ret_count += ret_count;
> +	}
> +	printf(PREFIX PREFIX "actually read 0x%lx bytes of data in total\n",
> +	       total_ret_count);
> +
> +	return total_ret_count == expected;
> +}
> +
> +static bool read_hugepage_filemap(int fd, size_t len,
> +				  size_t wr_chunk_size, size_t expected)
> +{
> +	char buf[MAX_WRITE_READ_CHUNK_SIZE];
> +	ssize_t ret_count = 0;
> +	ssize_t total_ret_count = 0;
> +	char val = 0;
> +
> +	printf(PREFIX PREFIX "expect to read 0x%lx bytes of data in total\n",
> +	       expected);
> +	while (total_ret_count < len) {
> +		ret_count = read(fd, buf, wr_chunk_size);
> +		if (ret_count == 0) {
> +			printf(PREFIX PREFIX "read reach end of the file\n");
> +			break;
> +		} else if (ret_count < 0) {
> +			perror(PREFIX ERROR_PREFIX "read failed");
> +			break;
> +		}
> +		++val;
> +		if (!verify_chunk(buf, ret_count, val))
> +			return false;
> +
> +		total_ret_count += ret_count;
> +	}
> +	printf(PREFIX PREFIX "actually read 0x%lx bytes of data in total\n",
> +	       total_ret_count);
> +
> +	return total_ret_count == expected;
> +}
> +
> +static enum test_status
> +test_hugetlb_read(int fd, size_t len, size_t wr_chunk_size)
> +{
> +	enum test_status status = TEST_SKIPPED;
> +	char *filemap = NULL;
> +
> +	if (ftruncate(fd, len) < 0) {
> +		perror(PREFIX ERROR_PREFIX "ftruncate failed");
> +		return status;
> +	}
> +
> +	filemap = mmap(NULL, len, PROT_READ | PROT_WRITE,
> +		       MAP_SHARED | MAP_POPULATE, fd, 0);
> +	if (filemap == MAP_FAILED) {
> +		perror(PREFIX ERROR_PREFIX "mmap for primary mapping failed");
> +		goto done;
> +	}
> +
> +	setup_filemap(filemap, len, wr_chunk_size);
> +	status = TEST_FAILED;
> +
> +	if (read_hugepage_filemap(fd, len, wr_chunk_size, len))
> +		status = TEST_PASSED;
> +
> +	munmap(filemap, len);
> +done:
> +	if (ftruncate(fd, 0) < 0) {
> +		perror(PREFIX ERROR_PREFIX "ftruncate back to 0 failed");
> +		status = TEST_FAILED;
> +	}
> +
> +	return status;
> +}
> +
> +static enum test_status
> +test_hugetlb_read_hwpoison(int fd, size_t len, size_t wr_chunk_size,
> +			   bool skip_hwpoison_page)
> +{
> +	enum test_status status = TEST_SKIPPED;
> +	char *filemap = NULL;
> +	char *hwp_addr = NULL;
> +	const unsigned long pagesize = getpagesize();
> +
> +	if (ftruncate(fd, len) < 0) {
> +		perror(PREFIX ERROR_PREFIX "ftruncate failed");
> +		return status;
> +	}
> +
> +	filemap = mmap(NULL, len, PROT_READ | PROT_WRITE,
> +		       MAP_SHARED | MAP_POPULATE, fd, 0);
> +	if (filemap == MAP_FAILED) {
> +		perror(PREFIX ERROR_PREFIX "mmap for primary mapping failed");
> +		goto done;
> +	}
> +
> +	setup_filemap(filemap, len, wr_chunk_size);
> +	status = TEST_FAILED;
> +
> +	/*
> +	 * Poisoned hugetlb page layout (assume hugepagesize=2MB):
> +	 * |<---------------------- 1MB ---------------------->|
> +	 * |<---- healthy page ---->|<---- HWPOISON page ----->|
> +	 * |<------------------- (1MB - 8KB) ----------------->|
> +	 */
> +	hwp_addr = filemap + len / 2 + pagesize;
> +	if (madvise(hwp_addr, pagesize, MADV_HWPOISON) < 0) {
> +		perror(PREFIX ERROR_PREFIX "MADV_HWPOISON failed");
> +		goto unmap;
> +	}
> +
> +	if (!skip_hwpoison_page) {
> +		/*
> +		 * Userspace should be able to read (1MB + 1 page) from
> +		 * the beginning of the HWPOISONed hugepage.
> +		 */
> +		if (read_hugepage_filemap(fd, len, wr_chunk_size,
> +					  len / 2 + pagesize))
> +			status = TEST_PASSED;
> +	} else {
> +		/*
> +		 * Userspace should be able to read (1MB - 2 pages) from
> +		 * HWPOISONed hugepage.
> +		 */
> +		if (seek_read_hugepage_filemap(fd, len, wr_chunk_size,
> +					       len / 2 + MAX(2 * pagesize, wr_chunk_size),
> +					       len / 2 - MAX(2 * pagesize, wr_chunk_size)))
> +			status = TEST_PASSED;
> +	}
> +
> +unmap:
> +	munmap(filemap, len);
> +done:
> +	if (ftruncate(fd, 0) < 0) {
> +		perror(PREFIX ERROR_PREFIX "ftruncate back to 0 failed");
> +		status = TEST_FAILED;
> +	}
> +
> +	return status;
> +}
> +
> +static int create_hugetlbfs_file(struct statfs *file_stat)
> +{
> +	int fd;
> +
> +	fd = memfd_create("hugetlb_tmp", MFD_HUGETLB);
> +	if (fd < 0) {
> +		perror(PREFIX ERROR_PREFIX "could not open hugetlbfs file");
> +		return -1;
> +	}
> +
> +	memset(file_stat, 0, sizeof(*file_stat));
> +	if (fstatfs(fd, file_stat)) {
> +		perror(PREFIX ERROR_PREFIX "fstatfs failed");
> +		goto close;
> +	}
> +	if (file_stat->f_type != HUGETLBFS_MAGIC) {
> +		printf(PREFIX ERROR_PREFIX "not hugetlbfs file\n");
> +		goto close;
> +	}
> +
> +	return fd;
> +close:
> +	close(fd);
> +	return -1;
> +}
> +
> +int main(void)
> +{
> +	int fd;
> +	struct statfs file_stat;
> +	enum test_status status;
> +	/* Test read() in different granularity. */
> +	size_t wr_chunk_sizes[] = {
> +		getpagesize() / 2, getpagesize(),
> +		getpagesize() * 2, getpagesize() * 4
> +	};
> +	size_t i;
> +
> +	for (i = 0; i < ARRAY_SIZE(wr_chunk_sizes); ++i) {
> +		printf("Write/read chunk size=0x%lx\n",
> +		       wr_chunk_sizes[i]);
> +
> +		fd = create_hugetlbfs_file(&file_stat);
> +		if (fd < 0)
> +			goto create_failure;
> +		printf(PREFIX "HugeTLB read regression test...\n");
> +		status = test_hugetlb_read(fd, file_stat.f_bsize,
> +					   wr_chunk_sizes[i]);
> +		printf(PREFIX "HugeTLB read regression test...%s\n",
> +		       status_to_str(status));
> +		close(fd);
> +		if (status == TEST_FAILED)
> +			return -1;
> +
> +		fd = create_hugetlbfs_file(&file_stat);
> +		if (fd < 0)
> +			goto create_failure;
> +		printf(PREFIX "HugeTLB read HWPOISON test...\n");
> +		status = test_hugetlb_read_hwpoison(fd, file_stat.f_bsize,
> +						    wr_chunk_sizes[i], false);
> +		printf(PREFIX "HugeTLB read HWPOISON test...%s\n",
> +		       status_to_str(status));
> +		close(fd);
> +		if (status == TEST_FAILED)
> +			return -1;
> +
> +		fd = create_hugetlbfs_file(&file_stat);
> +		if (fd < 0)
> +			goto create_failure;
> +		printf(PREFIX "HugeTLB seek then read HWPOISON test...\n");
> +		status = test_hugetlb_read_hwpoison(fd, file_stat.f_bsize,
> +						    wr_chunk_sizes[i], true);
> +		printf(PREFIX "HugeTLB seek then read HWPOISON test...%s\n",
> +		       status_to_str(status));
> +		close(fd);
> +		if (status == TEST_FAILED)
> +			return -1;
> +	}
> +
> +	return 0;
> +
> +create_failure:
> +	printf(ERROR_PREFIX "Abort test: failed to create hugetlbfs file\n");
> +	return -1;
> +}

-- 
BR,
Muhammad Usama Anjum


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v4 4/4] selftests/mm: add tests for HWPOISON hugetlbfs read
  2024-01-05  6:27   ` Muhammad Usama Anjum
@ 2024-01-05 21:13     ` Jiaqi Yan
  2024-01-10  6:49       ` Muhammad Usama Anjum
  0 siblings, 1 reply; 19+ messages in thread
From: Jiaqi Yan @ 2024-01-05 21:13 UTC (permalink / raw)
  To: Muhammad Usama Anjum
  Cc: linmiaohe, mike.kravetz, naoya.horiguchi, akpm, songmuchun,
	shy828301, linux-mm, linux-kernel, jthoughton, kernel

On Thu, Jan 4, 2024 at 10:27 PM Muhammad Usama Anjum
<usama.anjum@collabora.com> wrote:
>
> Hi,
>
> I'm trying to convert this test to TAP as I think the failures sometimes go
> unnoticed on CI systems if we only depend on the return value of the
> application. I've enabled the following configurations which aren't already
> present in tools/testing/selftests/mm/config:
> CONFIG_MEMORY_FAILURE=y
> CONFIG_HWPOISON_INJECT=m
>
> I'll send a patch to add these configs later. Right now I'm trying to
> investigate the failure when we are trying to inject the poison page by
> madvise(MADV_HWPOISON). I'm getting device busy every single time. The test
> fails as it doesn't expect any business for the hugetlb memory. I'm not
> sure if the poison handling code has issues or test isn't robust enough.
>
> ./hugetlb-read-hwpoison
> Write/read chunk size=0x800
>  ... HugeTLB read regression test...
>  ...  ... expect to read 0x200000 bytes of data in total
>  ...  ... actually read 0x200000 bytes of data in total
>  ... HugeTLB read regression test...TEST_PASSED
>  ... HugeTLB read HWPOISON test...
> [    9.280854] Injecting memory failure for pfn 0x102f01 at process virtual
> address 0x7f28ec101000
> [    9.282029] Memory failure: 0x102f01: huge page still referenced by 511
> users
> [    9.282987] Memory failure: 0x102f01: recovery action for huge page: Failed
>  ...  !!! MADV_HWPOISON failed: Device or resource busy
>  ... HugeTLB read HWPOISON test...TEST_FAILED
>
> I'm testing on v6.7-rc8. Not sure if this was working previously or not.

Thanks for reporting this, Usama!

I am also able to repro MADV_HWPOISON failure at "501a06fe8e4c
(akpm/mm-stable, mm-stable) zswap: memcontrol: implement zswap
writeback disabling."

Then I checked out the earliest commit "ba91e7e5d15a (HEAD -> Base)
selftests/mm: add tests for HWPOISON hugetlbfs read". The
MADV_HWPOISON injection works and and the test passes:

 ... HugeTLB read HWPOISON test...
 ...  ... expect to read 0x101000 bytes of data in total
 ...  !!! read failed: Input/output error
 ...  ... actually read 0x101000 bytes of data in total
 ... HugeTLB read HWPOISON test...TEST_PASSED
 ... HugeTLB seek then read HWPOISON test...
 ...  ... init val=4 with offset=0x102000
 ...  ... expect to read 0xfe000 bytes of data in total
 ...  ... actually read 0xfe000 bytes of data in total
 ... HugeTLB seek then read HWPOISON test...TEST_PASSED
 ...

[ 2109.209225] Injecting memory failure for pfn 0x3190d01 at process
virtual address 0x7f75e3101000
[ 2109.209438] Memory failure: 0x3190d01: recovery action for huge
page: Recovered
...

I think something in between broken MADV_HWPOISON on hugetlbfs, and we
should be able to figure it out via bisection (and of course by
reading delta commits between them, probably related to page
refcount).

That being said, I will be on vacation from tomorrow until the end of
next week. So I will get back to this after next weekend. Meanwhile if
you want to go ahead and bisect the problematic commit, that will be
very much appreciated.

Thanks,
Jiaqi


>
> Regards,
> Usama
>
> On 7/13/23 5:18 AM, Jiaqi Yan wrote:
> > Add tests for the improvement made to read operation on HWPOISON
> > hugetlb page with different read granularities. For each chunk size,
> > three read scenarios are tested:
> > 1. Simple regression test on read without HWPOISON.
> > 2. Sequential read page by page should succeed until encounters the 1st
> >    raw HWPOISON subpage.
> > 3. After skip a raw HWPOISON subpage by lseek, read()s always succeed.
> >
> > Acked-by: Mike Kravetz <mike.kravetz@oracle.com>
> > Reviewed-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
> > Signed-off-by: Jiaqi Yan <jiaqiyan@google.com>
> > ---
> >  tools/testing/selftests/mm/.gitignore         |   1 +
> >  tools/testing/selftests/mm/Makefile           |   1 +
> >  .../selftests/mm/hugetlb-read-hwpoison.c      | 322 ++++++++++++++++++
> >  3 files changed, 324 insertions(+)
> >  create mode 100644 tools/testing/selftests/mm/hugetlb-read-hwpoison.c
> >
> > diff --git a/tools/testing/selftests/mm/.gitignore b/tools/testing/selftests/mm/.gitignore
> > index 7e2a982383c0..cdc9ce4426b9 100644
> > --- a/tools/testing/selftests/mm/.gitignore
> > +++ b/tools/testing/selftests/mm/.gitignore
> > @@ -5,6 +5,7 @@ hugepage-mremap
> >  hugepage-shm
> >  hugepage-vmemmap
> >  hugetlb-madvise
> > +hugetlb-read-hwpoison
> >  khugepaged
> >  map_hugetlb
> >  map_populate
> > diff --git a/tools/testing/selftests/mm/Makefile b/tools/testing/selftests/mm/Makefile
> > index 66d7c07dc177..b7fce9073279 100644
> > --- a/tools/testing/selftests/mm/Makefile
> > +++ b/tools/testing/selftests/mm/Makefile
> > @@ -41,6 +41,7 @@ TEST_GEN_PROGS += gup_longterm
> >  TEST_GEN_PROGS += gup_test
> >  TEST_GEN_PROGS += hmm-tests
> >  TEST_GEN_PROGS += hugetlb-madvise
> > +TEST_GEN_PROGS += hugetlb-read-hwpoison
> >  TEST_GEN_PROGS += hugepage-mmap
> >  TEST_GEN_PROGS += hugepage-mremap
> >  TEST_GEN_PROGS += hugepage-shm
> > diff --git a/tools/testing/selftests/mm/hugetlb-read-hwpoison.c b/tools/testing/selftests/mm/hugetlb-read-hwpoison.c
> > new file mode 100644
> > index 000000000000..ba6cc6f9cabc
> > --- /dev/null
> > +++ b/tools/testing/selftests/mm/hugetlb-read-hwpoison.c
> > @@ -0,0 +1,322 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +
> > +#define _GNU_SOURCE
> > +#include <stdlib.h>
> > +#include <stdio.h>
> > +#include <string.h>
> > +
> > +#include <linux/magic.h>
> > +#include <sys/mman.h>
> > +#include <sys/statfs.h>
> > +#include <errno.h>
> > +#include <stdbool.h>
> > +
> > +#include "../kselftest.h"
> > +
> > +#define PREFIX " ... "
> > +#define ERROR_PREFIX " !!! "
> > +
> > +#define MAX_WRITE_READ_CHUNK_SIZE (getpagesize() * 16)
> > +#define MAX(a, b) (((a) > (b)) ? (a) : (b))
> > +
> > +enum test_status {
> > +     TEST_PASSED = 0,
> > +     TEST_FAILED = 1,
> > +     TEST_SKIPPED = 2,
> > +};
> > +
> > +static char *status_to_str(enum test_status status)
> > +{
> > +     switch (status) {
> > +     case TEST_PASSED:
> > +             return "TEST_PASSED";
> > +     case TEST_FAILED:
> > +             return "TEST_FAILED";
> > +     case TEST_SKIPPED:
> > +             return "TEST_SKIPPED";
> > +     default:
> > +             return "TEST_???";
> > +     }
> > +}
> > +
> > +static int setup_filemap(char *filemap, size_t len, size_t wr_chunk_size)
> > +{
> > +     char iter = 0;
> > +
> > +     for (size_t offset = 0; offset < len;
> > +          offset += wr_chunk_size) {
> > +             iter++;
> > +             memset(filemap + offset, iter, wr_chunk_size);
> > +     }
> > +
> > +     return 0;
> > +}
> > +
> > +static bool verify_chunk(char *buf, size_t len, char val)
> > +{
> > +     size_t i;
> > +
> > +     for (i = 0; i < len; ++i) {
> > +             if (buf[i] != val) {
> > +                     printf(PREFIX ERROR_PREFIX "check fail: buf[%lu] = %u != %u\n",
> > +                             i, buf[i], val);
> > +                     return false;
> > +             }
> > +     }
> > +
> > +     return true;
> > +}
> > +
> > +static bool seek_read_hugepage_filemap(int fd, size_t len, size_t wr_chunk_size,
> > +                                    off_t offset, size_t expected)
> > +{
> > +     char buf[MAX_WRITE_READ_CHUNK_SIZE];
> > +     ssize_t ret_count = 0;
> > +     ssize_t total_ret_count = 0;
> > +     char val = offset / wr_chunk_size + offset % wr_chunk_size;
> > +
> > +     printf(PREFIX PREFIX "init val=%u with offset=0x%lx\n", val, offset);
> > +     printf(PREFIX PREFIX "expect to read 0x%lx bytes of data in total\n",
> > +            expected);
> > +     if (lseek(fd, offset, SEEK_SET) < 0) {
> > +             perror(PREFIX ERROR_PREFIX "seek failed");
> > +             return false;
> > +     }
> > +
> > +     while (offset + total_ret_count < len) {
> > +             ret_count = read(fd, buf, wr_chunk_size);
> > +             if (ret_count == 0) {
> > +                     printf(PREFIX PREFIX "read reach end of the file\n");
> > +                     break;
> > +             } else if (ret_count < 0) {
> > +                     perror(PREFIX ERROR_PREFIX "read failed");
> > +                     break;
> > +             }
> > +             ++val;
> > +             if (!verify_chunk(buf, ret_count, val))
> > +                     return false;
> > +
> > +             total_ret_count += ret_count;
> > +     }
> > +     printf(PREFIX PREFIX "actually read 0x%lx bytes of data in total\n",
> > +            total_ret_count);
> > +
> > +     return total_ret_count == expected;
> > +}
> > +
> > +static bool read_hugepage_filemap(int fd, size_t len,
> > +                               size_t wr_chunk_size, size_t expected)
> > +{
> > +     char buf[MAX_WRITE_READ_CHUNK_SIZE];
> > +     ssize_t ret_count = 0;
> > +     ssize_t total_ret_count = 0;
> > +     char val = 0;
> > +
> > +     printf(PREFIX PREFIX "expect to read 0x%lx bytes of data in total\n",
> > +            expected);
> > +     while (total_ret_count < len) {
> > +             ret_count = read(fd, buf, wr_chunk_size);
> > +             if (ret_count == 0) {
> > +                     printf(PREFIX PREFIX "read reach end of the file\n");
> > +                     break;
> > +             } else if (ret_count < 0) {
> > +                     perror(PREFIX ERROR_PREFIX "read failed");
> > +                     break;
> > +             }
> > +             ++val;
> > +             if (!verify_chunk(buf, ret_count, val))
> > +                     return false;
> > +
> > +             total_ret_count += ret_count;
> > +     }
> > +     printf(PREFIX PREFIX "actually read 0x%lx bytes of data in total\n",
> > +            total_ret_count);
> > +
> > +     return total_ret_count == expected;
> > +}
> > +
> > +static enum test_status
> > +test_hugetlb_read(int fd, size_t len, size_t wr_chunk_size)
> > +{
> > +     enum test_status status = TEST_SKIPPED;
> > +     char *filemap = NULL;
> > +
> > +     if (ftruncate(fd, len) < 0) {
> > +             perror(PREFIX ERROR_PREFIX "ftruncate failed");
> > +             return status;
> > +     }
> > +
> > +     filemap = mmap(NULL, len, PROT_READ | PROT_WRITE,
> > +                    MAP_SHARED | MAP_POPULATE, fd, 0);
> > +     if (filemap == MAP_FAILED) {
> > +             perror(PREFIX ERROR_PREFIX "mmap for primary mapping failed");
> > +             goto done;
> > +     }
> > +
> > +     setup_filemap(filemap, len, wr_chunk_size);
> > +     status = TEST_FAILED;
> > +
> > +     if (read_hugepage_filemap(fd, len, wr_chunk_size, len))
> > +             status = TEST_PASSED;
> > +
> > +     munmap(filemap, len);
> > +done:
> > +     if (ftruncate(fd, 0) < 0) {
> > +             perror(PREFIX ERROR_PREFIX "ftruncate back to 0 failed");
> > +             status = TEST_FAILED;
> > +     }
> > +
> > +     return status;
> > +}
> > +
> > +static enum test_status
> > +test_hugetlb_read_hwpoison(int fd, size_t len, size_t wr_chunk_size,
> > +                        bool skip_hwpoison_page)
> > +{
> > +     enum test_status status = TEST_SKIPPED;
> > +     char *filemap = NULL;
> > +     char *hwp_addr = NULL;
> > +     const unsigned long pagesize = getpagesize();
> > +
> > +     if (ftruncate(fd, len) < 0) {
> > +             perror(PREFIX ERROR_PREFIX "ftruncate failed");
> > +             return status;
> > +     }
> > +
> > +     filemap = mmap(NULL, len, PROT_READ | PROT_WRITE,
> > +                    MAP_SHARED | MAP_POPULATE, fd, 0);
> > +     if (filemap == MAP_FAILED) {
> > +             perror(PREFIX ERROR_PREFIX "mmap for primary mapping failed");
> > +             goto done;
> > +     }
> > +
> > +     setup_filemap(filemap, len, wr_chunk_size);
> > +     status = TEST_FAILED;
> > +
> > +     /*
> > +      * Poisoned hugetlb page layout (assume hugepagesize=2MB):
> > +      * |<---------------------- 1MB ---------------------->|
> > +      * |<---- healthy page ---->|<---- HWPOISON page ----->|
> > +      * |<------------------- (1MB - 8KB) ----------------->|
> > +      */
> > +     hwp_addr = filemap + len / 2 + pagesize;
> > +     if (madvise(hwp_addr, pagesize, MADV_HWPOISON) < 0) {
> > +             perror(PREFIX ERROR_PREFIX "MADV_HWPOISON failed");
> > +             goto unmap;
> > +     }
> > +
> > +     if (!skip_hwpoison_page) {
> > +             /*
> > +              * Userspace should be able to read (1MB + 1 page) from
> > +              * the beginning of the HWPOISONed hugepage.
> > +              */
> > +             if (read_hugepage_filemap(fd, len, wr_chunk_size,
> > +                                       len / 2 + pagesize))
> > +                     status = TEST_PASSED;
> > +     } else {
> > +             /*
> > +              * Userspace should be able to read (1MB - 2 pages) from
> > +              * HWPOISONed hugepage.
> > +              */
> > +             if (seek_read_hugepage_filemap(fd, len, wr_chunk_size,
> > +                                            len / 2 + MAX(2 * pagesize, wr_chunk_size),
> > +                                            len / 2 - MAX(2 * pagesize, wr_chunk_size)))
> > +                     status = TEST_PASSED;
> > +     }
> > +
> > +unmap:
> > +     munmap(filemap, len);
> > +done:
> > +     if (ftruncate(fd, 0) < 0) {
> > +             perror(PREFIX ERROR_PREFIX "ftruncate back to 0 failed");
> > +             status = TEST_FAILED;
> > +     }
> > +
> > +     return status;
> > +}
> > +
> > +static int create_hugetlbfs_file(struct statfs *file_stat)
> > +{
> > +     int fd;
> > +
> > +     fd = memfd_create("hugetlb_tmp", MFD_HUGETLB);
> > +     if (fd < 0) {
> > +             perror(PREFIX ERROR_PREFIX "could not open hugetlbfs file");
> > +             return -1;
> > +     }
> > +
> > +     memset(file_stat, 0, sizeof(*file_stat));
> > +     if (fstatfs(fd, file_stat)) {
> > +             perror(PREFIX ERROR_PREFIX "fstatfs failed");
> > +             goto close;
> > +     }
> > +     if (file_stat->f_type != HUGETLBFS_MAGIC) {
> > +             printf(PREFIX ERROR_PREFIX "not hugetlbfs file\n");
> > +             goto close;
> > +     }
> > +
> > +     return fd;
> > +close:
> > +     close(fd);
> > +     return -1;
> > +}
> > +
> > +int main(void)
> > +{
> > +     int fd;
> > +     struct statfs file_stat;
> > +     enum test_status status;
> > +     /* Test read() in different granularity. */
> > +     size_t wr_chunk_sizes[] = {
> > +             getpagesize() / 2, getpagesize(),
> > +             getpagesize() * 2, getpagesize() * 4
> > +     };
> > +     size_t i;
> > +
> > +     for (i = 0; i < ARRAY_SIZE(wr_chunk_sizes); ++i) {
> > +             printf("Write/read chunk size=0x%lx\n",
> > +                    wr_chunk_sizes[i]);
> > +
> > +             fd = create_hugetlbfs_file(&file_stat);
> > +             if (fd < 0)
> > +                     goto create_failure;
> > +             printf(PREFIX "HugeTLB read regression test...\n");
> > +             status = test_hugetlb_read(fd, file_stat.f_bsize,
> > +                                        wr_chunk_sizes[i]);
> > +             printf(PREFIX "HugeTLB read regression test...%s\n",
> > +                    status_to_str(status));
> > +             close(fd);
> > +             if (status == TEST_FAILED)
> > +                     return -1;
> > +
> > +             fd = create_hugetlbfs_file(&file_stat);
> > +             if (fd < 0)
> > +                     goto create_failure;
> > +             printf(PREFIX "HugeTLB read HWPOISON test...\n");
> > +             status = test_hugetlb_read_hwpoison(fd, file_stat.f_bsize,
> > +                                                 wr_chunk_sizes[i], false);
> > +             printf(PREFIX "HugeTLB read HWPOISON test...%s\n",
> > +                    status_to_str(status));
> > +             close(fd);
> > +             if (status == TEST_FAILED)
> > +                     return -1;
> > +
> > +             fd = create_hugetlbfs_file(&file_stat);
> > +             if (fd < 0)
> > +                     goto create_failure;
> > +             printf(PREFIX "HugeTLB seek then read HWPOISON test...\n");
> > +             status = test_hugetlb_read_hwpoison(fd, file_stat.f_bsize,
> > +                                                 wr_chunk_sizes[i], true);
> > +             printf(PREFIX "HugeTLB seek then read HWPOISON test...%s\n",
> > +                    status_to_str(status));
> > +             close(fd);
> > +             if (status == TEST_FAILED)
> > +                     return -1;
> > +     }
> > +
> > +     return 0;
> > +
> > +create_failure:
> > +     printf(ERROR_PREFIX "Abort test: failed to create hugetlbfs file\n");
> > +     return -1;
> > +}
>
> --
> BR,
> Muhammad Usama Anjum


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v4 4/4] selftests/mm: add tests for HWPOISON hugetlbfs read
  2024-01-05 21:13     ` Jiaqi Yan
@ 2024-01-10  6:49       ` Muhammad Usama Anjum
  2024-01-10 10:15         ` Muhammad Usama Anjum
  0 siblings, 1 reply; 19+ messages in thread
From: Muhammad Usama Anjum @ 2024-01-10  6:49 UTC (permalink / raw)
  To: Jiaqi Yan
  Cc: Muhammad Usama Anjum, linmiaohe, mike.kravetz, naoya.horiguchi,
	akpm, songmuchun, shy828301, linux-mm, linux-kernel, jthoughton,
	kernel

On 1/6/24 2:13 AM, Jiaqi Yan wrote:
> On Thu, Jan 4, 2024 at 10:27 PM Muhammad Usama Anjum
> <usama.anjum@collabora.com> wrote:
>>
>> Hi,
>>
>> I'm trying to convert this test to TAP as I think the failures sometimes go
>> unnoticed on CI systems if we only depend on the return value of the
>> application. I've enabled the following configurations which aren't already
>> present in tools/testing/selftests/mm/config:
>> CONFIG_MEMORY_FAILURE=y
>> CONFIG_HWPOISON_INJECT=m
>>
>> I'll send a patch to add these configs later. Right now I'm trying to
>> investigate the failure when we are trying to inject the poison page by
>> madvise(MADV_HWPOISON). I'm getting device busy every single time. The test
>> fails as it doesn't expect any business for the hugetlb memory. I'm not
>> sure if the poison handling code has issues or test isn't robust enough.
>>
>> ./hugetlb-read-hwpoison
>> Write/read chunk size=0x800
>>  ... HugeTLB read regression test...
>>  ...  ... expect to read 0x200000 bytes of data in total
>>  ...  ... actually read 0x200000 bytes of data in total
>>  ... HugeTLB read regression test...TEST_PASSED
>>  ... HugeTLB read HWPOISON test...
>> [    9.280854] Injecting memory failure for pfn 0x102f01 at process virtual
>> address 0x7f28ec101000
>> [    9.282029] Memory failure: 0x102f01: huge page still referenced by 511
>> users
>> [    9.282987] Memory failure: 0x102f01: recovery action for huge page: Failed
>>  ...  !!! MADV_HWPOISON failed: Device or resource busy
>>  ... HugeTLB read HWPOISON test...TEST_FAILED
>>
>> I'm testing on v6.7-rc8. Not sure if this was working previously or not.
> 
> Thanks for reporting this, Usama!
> 
> I am also able to repro MADV_HWPOISON failure at "501a06fe8e4c
> (akpm/mm-stable, mm-stable) zswap: memcontrol: implement zswap
> writeback disabling."
> 
> Then I checked out the earliest commit "ba91e7e5d15a (HEAD -> Base)
> selftests/mm: add tests for HWPOISON hugetlbfs read". The
> MADV_HWPOISON injection works and and the test passes:
> 
>  ... HugeTLB read HWPOISON test...
>  ...  ... expect to read 0x101000 bytes of data in total
>  ...  !!! read failed: Input/output error
>  ...  ... actually read 0x101000 bytes of data in total
>  ... HugeTLB read HWPOISON test...TEST_PASSED
>  ... HugeTLB seek then read HWPOISON test...
>  ...  ... init val=4 with offset=0x102000
>  ...  ... expect to read 0xfe000 bytes of data in total
>  ...  ... actually read 0xfe000 bytes of data in total
>  ... HugeTLB seek then read HWPOISON test...TEST_PASSED
>  ...
> 
> [ 2109.209225] Injecting memory failure for pfn 0x3190d01 at process
> virtual address 0x7f75e3101000
> [ 2109.209438] Memory failure: 0x3190d01: recovery action for huge
> page: Recovered
> ...
> 
> I think something in between broken MADV_HWPOISON on hugetlbfs, and we
> should be able to figure it out via bisection (and of course by
> reading delta commits between them, probably related to page
> refcount).
Thank you for this information.

> 
> That being said, I will be on vacation from tomorrow until the end of
> next week. So I will get back to this after next weekend. Meanwhile if
> you want to go ahead and bisect the problematic commit, that will be
> very much appreciated.
I'll try to bisect and post here if I find something.

> 
> Thanks,
> Jiaqi
> 
> 
>>
>> Regards,
>> Usama
>>
>> On 7/13/23 5:18 AM, Jiaqi Yan wrote:
>>> Add tests for the improvement made to read operation on HWPOISON
>>> hugetlb page with different read granularities. For each chunk size,
>>> three read scenarios are tested:
>>> 1. Simple regression test on read without HWPOISON.
>>> 2. Sequential read page by page should succeed until encounters the 1st
>>>    raw HWPOISON subpage.
>>> 3. After skip a raw HWPOISON subpage by lseek, read()s always succeed.
>>>
>>> Acked-by: Mike Kravetz <mike.kravetz@oracle.com>
>>> Reviewed-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
>>> Signed-off-by: Jiaqi Yan <jiaqiyan@google.com>
>>> ---
>>>  tools/testing/selftests/mm/.gitignore         |   1 +
>>>  tools/testing/selftests/mm/Makefile           |   1 +
>>>  .../selftests/mm/hugetlb-read-hwpoison.c      | 322 ++++++++++++++++++
>>>  3 files changed, 324 insertions(+)
>>>  create mode 100644 tools/testing/selftests/mm/hugetlb-read-hwpoison.c
>>>
>>> diff --git a/tools/testing/selftests/mm/.gitignore b/tools/testing/selftests/mm/.gitignore
>>> index 7e2a982383c0..cdc9ce4426b9 100644
>>> --- a/tools/testing/selftests/mm/.gitignore
>>> +++ b/tools/testing/selftests/mm/.gitignore
>>> @@ -5,6 +5,7 @@ hugepage-mremap
>>>  hugepage-shm
>>>  hugepage-vmemmap
>>>  hugetlb-madvise
>>> +hugetlb-read-hwpoison
>>>  khugepaged
>>>  map_hugetlb
>>>  map_populate
>>> diff --git a/tools/testing/selftests/mm/Makefile b/tools/testing/selftests/mm/Makefile
>>> index 66d7c07dc177..b7fce9073279 100644
>>> --- a/tools/testing/selftests/mm/Makefile
>>> +++ b/tools/testing/selftests/mm/Makefile
>>> @@ -41,6 +41,7 @@ TEST_GEN_PROGS += gup_longterm
>>>  TEST_GEN_PROGS += gup_test
>>>  TEST_GEN_PROGS += hmm-tests
>>>  TEST_GEN_PROGS += hugetlb-madvise
>>> +TEST_GEN_PROGS += hugetlb-read-hwpoison
>>>  TEST_GEN_PROGS += hugepage-mmap
>>>  TEST_GEN_PROGS += hugepage-mremap
>>>  TEST_GEN_PROGS += hugepage-shm
>>> diff --git a/tools/testing/selftests/mm/hugetlb-read-hwpoison.c b/tools/testing/selftests/mm/hugetlb-read-hwpoison.c
>>> new file mode 100644
>>> index 000000000000..ba6cc6f9cabc
>>> --- /dev/null
>>> +++ b/tools/testing/selftests/mm/hugetlb-read-hwpoison.c
>>> @@ -0,0 +1,322 @@
>>> +// SPDX-License-Identifier: GPL-2.0
>>> +
>>> +#define _GNU_SOURCE
>>> +#include <stdlib.h>
>>> +#include <stdio.h>
>>> +#include <string.h>
>>> +
>>> +#include <linux/magic.h>
>>> +#include <sys/mman.h>
>>> +#include <sys/statfs.h>
>>> +#include <errno.h>
>>> +#include <stdbool.h>
>>> +
>>> +#include "../kselftest.h"
>>> +
>>> +#define PREFIX " ... "
>>> +#define ERROR_PREFIX " !!! "
>>> +
>>> +#define MAX_WRITE_READ_CHUNK_SIZE (getpagesize() * 16)
>>> +#define MAX(a, b) (((a) > (b)) ? (a) : (b))
>>> +
>>> +enum test_status {
>>> +     TEST_PASSED = 0,
>>> +     TEST_FAILED = 1,
>>> +     TEST_SKIPPED = 2,
>>> +};
>>> +
>>> +static char *status_to_str(enum test_status status)
>>> +{
>>> +     switch (status) {
>>> +     case TEST_PASSED:
>>> +             return "TEST_PASSED";
>>> +     case TEST_FAILED:
>>> +             return "TEST_FAILED";
>>> +     case TEST_SKIPPED:
>>> +             return "TEST_SKIPPED";
>>> +     default:
>>> +             return "TEST_???";
>>> +     }
>>> +}
>>> +
>>> +static int setup_filemap(char *filemap, size_t len, size_t wr_chunk_size)
>>> +{
>>> +     char iter = 0;
>>> +
>>> +     for (size_t offset = 0; offset < len;
>>> +          offset += wr_chunk_size) {
>>> +             iter++;
>>> +             memset(filemap + offset, iter, wr_chunk_size);
>>> +     }
>>> +
>>> +     return 0;
>>> +}
>>> +
>>> +static bool verify_chunk(char *buf, size_t len, char val)
>>> +{
>>> +     size_t i;
>>> +
>>> +     for (i = 0; i < len; ++i) {
>>> +             if (buf[i] != val) {
>>> +                     printf(PREFIX ERROR_PREFIX "check fail: buf[%lu] = %u != %u\n",
>>> +                             i, buf[i], val);
>>> +                     return false;
>>> +             }
>>> +     }
>>> +
>>> +     return true;
>>> +}
>>> +
>>> +static bool seek_read_hugepage_filemap(int fd, size_t len, size_t wr_chunk_size,
>>> +                                    off_t offset, size_t expected)
>>> +{
>>> +     char buf[MAX_WRITE_READ_CHUNK_SIZE];
>>> +     ssize_t ret_count = 0;
>>> +     ssize_t total_ret_count = 0;
>>> +     char val = offset / wr_chunk_size + offset % wr_chunk_size;
>>> +
>>> +     printf(PREFIX PREFIX "init val=%u with offset=0x%lx\n", val, offset);
>>> +     printf(PREFIX PREFIX "expect to read 0x%lx bytes of data in total\n",
>>> +            expected);
>>> +     if (lseek(fd, offset, SEEK_SET) < 0) {
>>> +             perror(PREFIX ERROR_PREFIX "seek failed");
>>> +             return false;
>>> +     }
>>> +
>>> +     while (offset + total_ret_count < len) {
>>> +             ret_count = read(fd, buf, wr_chunk_size);
>>> +             if (ret_count == 0) {
>>> +                     printf(PREFIX PREFIX "read reach end of the file\n");
>>> +                     break;
>>> +             } else if (ret_count < 0) {
>>> +                     perror(PREFIX ERROR_PREFIX "read failed");
>>> +                     break;
>>> +             }
>>> +             ++val;
>>> +             if (!verify_chunk(buf, ret_count, val))
>>> +                     return false;
>>> +
>>> +             total_ret_count += ret_count;
>>> +     }
>>> +     printf(PREFIX PREFIX "actually read 0x%lx bytes of data in total\n",
>>> +            total_ret_count);
>>> +
>>> +     return total_ret_count == expected;
>>> +}
>>> +
>>> +static bool read_hugepage_filemap(int fd, size_t len,
>>> +                               size_t wr_chunk_size, size_t expected)
>>> +{
>>> +     char buf[MAX_WRITE_READ_CHUNK_SIZE];
>>> +     ssize_t ret_count = 0;
>>> +     ssize_t total_ret_count = 0;
>>> +     char val = 0;
>>> +
>>> +     printf(PREFIX PREFIX "expect to read 0x%lx bytes of data in total\n",
>>> +            expected);
>>> +     while (total_ret_count < len) {
>>> +             ret_count = read(fd, buf, wr_chunk_size);
>>> +             if (ret_count == 0) {
>>> +                     printf(PREFIX PREFIX "read reach end of the file\n");
>>> +                     break;
>>> +             } else if (ret_count < 0) {
>>> +                     perror(PREFIX ERROR_PREFIX "read failed");
>>> +                     break;
>>> +             }
>>> +             ++val;
>>> +             if (!verify_chunk(buf, ret_count, val))
>>> +                     return false;
>>> +
>>> +             total_ret_count += ret_count;
>>> +     }
>>> +     printf(PREFIX PREFIX "actually read 0x%lx bytes of data in total\n",
>>> +            total_ret_count);
>>> +
>>> +     return total_ret_count == expected;
>>> +}
>>> +
>>> +static enum test_status
>>> +test_hugetlb_read(int fd, size_t len, size_t wr_chunk_size)
>>> +{
>>> +     enum test_status status = TEST_SKIPPED;
>>> +     char *filemap = NULL;
>>> +
>>> +     if (ftruncate(fd, len) < 0) {
>>> +             perror(PREFIX ERROR_PREFIX "ftruncate failed");
>>> +             return status;
>>> +     }
>>> +
>>> +     filemap = mmap(NULL, len, PROT_READ | PROT_WRITE,
>>> +                    MAP_SHARED | MAP_POPULATE, fd, 0);
>>> +     if (filemap == MAP_FAILED) {
>>> +             perror(PREFIX ERROR_PREFIX "mmap for primary mapping failed");
>>> +             goto done;
>>> +     }
>>> +
>>> +     setup_filemap(filemap, len, wr_chunk_size);
>>> +     status = TEST_FAILED;
>>> +
>>> +     if (read_hugepage_filemap(fd, len, wr_chunk_size, len))
>>> +             status = TEST_PASSED;
>>> +
>>> +     munmap(filemap, len);
>>> +done:
>>> +     if (ftruncate(fd, 0) < 0) {
>>> +             perror(PREFIX ERROR_PREFIX "ftruncate back to 0 failed");
>>> +             status = TEST_FAILED;
>>> +     }
>>> +
>>> +     return status;
>>> +}
>>> +
>>> +static enum test_status
>>> +test_hugetlb_read_hwpoison(int fd, size_t len, size_t wr_chunk_size,
>>> +                        bool skip_hwpoison_page)
>>> +{
>>> +     enum test_status status = TEST_SKIPPED;
>>> +     char *filemap = NULL;
>>> +     char *hwp_addr = NULL;
>>> +     const unsigned long pagesize = getpagesize();
>>> +
>>> +     if (ftruncate(fd, len) < 0) {
>>> +             perror(PREFIX ERROR_PREFIX "ftruncate failed");
>>> +             return status;
>>> +     }
>>> +
>>> +     filemap = mmap(NULL, len, PROT_READ | PROT_WRITE,
>>> +                    MAP_SHARED | MAP_POPULATE, fd, 0);
>>> +     if (filemap == MAP_FAILED) {
>>> +             perror(PREFIX ERROR_PREFIX "mmap for primary mapping failed");
>>> +             goto done;
>>> +     }
>>> +
>>> +     setup_filemap(filemap, len, wr_chunk_size);
>>> +     status = TEST_FAILED;
>>> +
>>> +     /*
>>> +      * Poisoned hugetlb page layout (assume hugepagesize=2MB):
>>> +      * |<---------------------- 1MB ---------------------->|
>>> +      * |<---- healthy page ---->|<---- HWPOISON page ----->|
>>> +      * |<------------------- (1MB - 8KB) ----------------->|
>>> +      */
>>> +     hwp_addr = filemap + len / 2 + pagesize;
>>> +     if (madvise(hwp_addr, pagesize, MADV_HWPOISON) < 0) {
>>> +             perror(PREFIX ERROR_PREFIX "MADV_HWPOISON failed");
>>> +             goto unmap;
>>> +     }
>>> +
>>> +     if (!skip_hwpoison_page) {
>>> +             /*
>>> +              * Userspace should be able to read (1MB + 1 page) from
>>> +              * the beginning of the HWPOISONed hugepage.
>>> +              */
>>> +             if (read_hugepage_filemap(fd, len, wr_chunk_size,
>>> +                                       len / 2 + pagesize))
>>> +                     status = TEST_PASSED;
>>> +     } else {
>>> +             /*
>>> +              * Userspace should be able to read (1MB - 2 pages) from
>>> +              * HWPOISONed hugepage.
>>> +              */
>>> +             if (seek_read_hugepage_filemap(fd, len, wr_chunk_size,
>>> +                                            len / 2 + MAX(2 * pagesize, wr_chunk_size),
>>> +                                            len / 2 - MAX(2 * pagesize, wr_chunk_size)))
>>> +                     status = TEST_PASSED;
>>> +     }
>>> +
>>> +unmap:
>>> +     munmap(filemap, len);
>>> +done:
>>> +     if (ftruncate(fd, 0) < 0) {
>>> +             perror(PREFIX ERROR_PREFIX "ftruncate back to 0 failed");
>>> +             status = TEST_FAILED;
>>> +     }
>>> +
>>> +     return status;
>>> +}
>>> +
>>> +static int create_hugetlbfs_file(struct statfs *file_stat)
>>> +{
>>> +     int fd;
>>> +
>>> +     fd = memfd_create("hugetlb_tmp", MFD_HUGETLB);
>>> +     if (fd < 0) {
>>> +             perror(PREFIX ERROR_PREFIX "could not open hugetlbfs file");
>>> +             return -1;
>>> +     }
>>> +
>>> +     memset(file_stat, 0, sizeof(*file_stat));
>>> +     if (fstatfs(fd, file_stat)) {
>>> +             perror(PREFIX ERROR_PREFIX "fstatfs failed");
>>> +             goto close;
>>> +     }
>>> +     if (file_stat->f_type != HUGETLBFS_MAGIC) {
>>> +             printf(PREFIX ERROR_PREFIX "not hugetlbfs file\n");
>>> +             goto close;
>>> +     }
>>> +
>>> +     return fd;
>>> +close:
>>> +     close(fd);
>>> +     return -1;
>>> +}
>>> +
>>> +int main(void)
>>> +{
>>> +     int fd;
>>> +     struct statfs file_stat;
>>> +     enum test_status status;
>>> +     /* Test read() in different granularity. */
>>> +     size_t wr_chunk_sizes[] = {
>>> +             getpagesize() / 2, getpagesize(),
>>> +             getpagesize() * 2, getpagesize() * 4
>>> +     };
>>> +     size_t i;
>>> +
>>> +     for (i = 0; i < ARRAY_SIZE(wr_chunk_sizes); ++i) {
>>> +             printf("Write/read chunk size=0x%lx\n",
>>> +                    wr_chunk_sizes[i]);
>>> +
>>> +             fd = create_hugetlbfs_file(&file_stat);
>>> +             if (fd < 0)
>>> +                     goto create_failure;
>>> +             printf(PREFIX "HugeTLB read regression test...\n");
>>> +             status = test_hugetlb_read(fd, file_stat.f_bsize,
>>> +                                        wr_chunk_sizes[i]);
>>> +             printf(PREFIX "HugeTLB read regression test...%s\n",
>>> +                    status_to_str(status));
>>> +             close(fd);
>>> +             if (status == TEST_FAILED)
>>> +                     return -1;
>>> +
>>> +             fd = create_hugetlbfs_file(&file_stat);
>>> +             if (fd < 0)
>>> +                     goto create_failure;
>>> +             printf(PREFIX "HugeTLB read HWPOISON test...\n");
>>> +             status = test_hugetlb_read_hwpoison(fd, file_stat.f_bsize,
>>> +                                                 wr_chunk_sizes[i], false);
>>> +             printf(PREFIX "HugeTLB read HWPOISON test...%s\n",
>>> +                    status_to_str(status));
>>> +             close(fd);
>>> +             if (status == TEST_FAILED)
>>> +                     return -1;
>>> +
>>> +             fd = create_hugetlbfs_file(&file_stat);
>>> +             if (fd < 0)
>>> +                     goto create_failure;
>>> +             printf(PREFIX "HugeTLB seek then read HWPOISON test...\n");
>>> +             status = test_hugetlb_read_hwpoison(fd, file_stat.f_bsize,
>>> +                                                 wr_chunk_sizes[i], true);
>>> +             printf(PREFIX "HugeTLB seek then read HWPOISON test...%s\n",
>>> +                    status_to_str(status));
>>> +             close(fd);
>>> +             if (status == TEST_FAILED)
>>> +                     return -1;
>>> +     }
>>> +
>>> +     return 0;
>>> +
>>> +create_failure:
>>> +     printf(ERROR_PREFIX "Abort test: failed to create hugetlbfs file\n");
>>> +     return -1;
>>> +}
>>
>> --
>> BR,
>> Muhammad Usama Anjum
> 

-- 
BR,
Muhammad Usama Anjum


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v4 4/4] selftests/mm: add tests for HWPOISON hugetlbfs read
  2024-01-10  6:49       ` Muhammad Usama Anjum
@ 2024-01-10 10:15         ` Muhammad Usama Anjum
  2024-01-11  2:32           ` Sidhartha Kumar
  2024-01-12  6:16           ` Muhammad Usama Anjum
  0 siblings, 2 replies; 19+ messages in thread
From: Muhammad Usama Anjum @ 2024-01-10 10:15 UTC (permalink / raw)
  To: Jiaqi Yan, Sidhartha Kumar
  Cc: Muhammad Usama Anjum, linmiaohe, mike.kravetz, naoya.horiguchi,
	akpm, songmuchun, shy828301, linux-mm, linux-kernel, jthoughton,
	kernel, Matthew Wilcox (Oracle),
	Mike Kravetz, Muchun Song

On 1/10/24 11:49 AM, Muhammad Usama Anjum wrote:
> On 1/6/24 2:13 AM, Jiaqi Yan wrote:
>> On Thu, Jan 4, 2024 at 10:27 PM Muhammad Usama Anjum
>> <usama.anjum@collabora.com> wrote:
>>>
>>> Hi,
>>>
>>> I'm trying to convert this test to TAP as I think the failures sometimes go
>>> unnoticed on CI systems if we only depend on the return value of the
>>> application. I've enabled the following configurations which aren't already
>>> present in tools/testing/selftests/mm/config:
>>> CONFIG_MEMORY_FAILURE=y
>>> CONFIG_HWPOISON_INJECT=m
>>>
>>> I'll send a patch to add these configs later. Right now I'm trying to
>>> investigate the failure when we are trying to inject the poison page by
>>> madvise(MADV_HWPOISON). I'm getting device busy every single time. The test
>>> fails as it doesn't expect any business for the hugetlb memory. I'm not
>>> sure if the poison handling code has issues or test isn't robust enough.
>>>
>>> ./hugetlb-read-hwpoison
>>> Write/read chunk size=0x800
>>>  ... HugeTLB read regression test...
>>>  ...  ... expect to read 0x200000 bytes of data in total
>>>  ...  ... actually read 0x200000 bytes of data in total
>>>  ... HugeTLB read regression test...TEST_PASSED
>>>  ... HugeTLB read HWPOISON test...
>>> [    9.280854] Injecting memory failure for pfn 0x102f01 at process virtual
>>> address 0x7f28ec101000
>>> [    9.282029] Memory failure: 0x102f01: huge page still referenced by 511
>>> users
>>> [    9.282987] Memory failure: 0x102f01: recovery action for huge page: Failed
>>>  ...  !!! MADV_HWPOISON failed: Device or resource busy
>>>  ... HugeTLB read HWPOISON test...TEST_FAILED
>>>
>>> I'm testing on v6.7-rc8. Not sure if this was working previously or not.
>>
>> Thanks for reporting this, Usama!
>>
>> I am also able to repro MADV_HWPOISON failure at "501a06fe8e4c
>> (akpm/mm-stable, mm-stable) zswap: memcontrol: implement zswap
>> writeback disabling."
>>
>> Then I checked out the earliest commit "ba91e7e5d15a (HEAD -> Base)
>> selftests/mm: add tests for HWPOISON hugetlbfs read". The
>> MADV_HWPOISON injection works and and the test passes:
>>
>>  ... HugeTLB read HWPOISON test...
>>  ...  ... expect to read 0x101000 bytes of data in total
>>  ...  !!! read failed: Input/output error
>>  ...  ... actually read 0x101000 bytes of data in total
>>  ... HugeTLB read HWPOISON test...TEST_PASSED
>>  ... HugeTLB seek then read HWPOISON test...
>>  ...  ... init val=4 with offset=0x102000
>>  ...  ... expect to read 0xfe000 bytes of data in total
>>  ...  ... actually read 0xfe000 bytes of data in total
>>  ... HugeTLB seek then read HWPOISON test...TEST_PASSED
>>  ...
>>
>> [ 2109.209225] Injecting memory failure for pfn 0x3190d01 at process
>> virtual address 0x7f75e3101000
>> [ 2109.209438] Memory failure: 0x3190d01: recovery action for huge
>> page: Recovered
>> ...
>>
>> I think something in between broken MADV_HWPOISON on hugetlbfs, and we
>> should be able to figure it out via bisection (and of course by
>> reading delta commits between them, probably related to page
>> refcount).
> Thank you for this information.
> 
>>
>> That being said, I will be on vacation from tomorrow until the end of
>> next week. So I will get back to this after next weekend. Meanwhile if
>> you want to go ahead and bisect the problematic commit, that will be
>> very much appreciated.
> I'll try to bisect and post here if I find something.
Found the culprit commit by bisection:

a08c7193e4f18dc8508f2d07d0de2c5b94cb39a3
mm/filemap: remove hugetlb special casing in filemap.c

hugetlb-read-hwpoison started failing from this patch. I've added the
author of this patch to this bug report.

> 
>>
>> Thanks,
>> Jiaqi
>>
>>
>>>
>>> Regards,
>>> Usama
>>>


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v4 4/4] selftests/mm: add tests for HWPOISON hugetlbfs read
  2024-01-10 10:15         ` Muhammad Usama Anjum
@ 2024-01-11  2:32           ` Sidhartha Kumar
  2024-01-11  8:48             ` Muhammad Usama Anjum
  2024-01-12  6:16           ` Muhammad Usama Anjum
  1 sibling, 1 reply; 19+ messages in thread
From: Sidhartha Kumar @ 2024-01-11  2:32 UTC (permalink / raw)
  To: Muhammad Usama Anjum, Jiaqi Yan
  Cc: linmiaohe, mike.kravetz, naoya.horiguchi, akpm, songmuchun,
	shy828301, linux-mm, linux-kernel, jthoughton, kernel,
	Matthew Wilcox (Oracle)

On 1/10/24 2:15 AM, Muhammad Usama Anjum wrote:
> On 1/10/24 11:49 AM, Muhammad Usama Anjum wrote:
>> On 1/6/24 2:13 AM, Jiaqi Yan wrote:
>>> On Thu, Jan 4, 2024 at 10:27 PM Muhammad Usama Anjum
>>> <usama.anjum@collabora.com> wrote:
>>>>
>>>> Hi,
>>>>
>>>> I'm trying to convert this test to TAP as I think the failures sometimes go
>>>> unnoticed on CI systems if we only depend on the return value of the
>>>> application. I've enabled the following configurations which aren't already
>>>> present in tools/testing/selftests/mm/config:
>>>> CONFIG_MEMORY_FAILURE=y
>>>> CONFIG_HWPOISON_INJECT=m
>>>>
>>>> I'll send a patch to add these configs later. Right now I'm trying to
>>>> investigate the failure when we are trying to inject the poison page by
>>>> madvise(MADV_HWPOISON). I'm getting device busy every single time. The test
>>>> fails as it doesn't expect any business for the hugetlb memory. I'm not
>>>> sure if the poison handling code has issues or test isn't robust enough.
>>>>
>>>> ./hugetlb-read-hwpoison
>>>> Write/read chunk size=0x800
>>>>   ... HugeTLB read regression test...
>>>>   ...  ... expect to read 0x200000 bytes of data in total
>>>>   ...  ... actually read 0x200000 bytes of data in total
>>>>   ... HugeTLB read regression test...TEST_PASSED
>>>>   ... HugeTLB read HWPOISON test...
>>>> [    9.280854] Injecting memory failure for pfn 0x102f01 at process virtual
>>>> address 0x7f28ec101000
>>>> [    9.282029] Memory failure: 0x102f01: huge page still referenced by 511
>>>> users
>>>> [    9.282987] Memory failure: 0x102f01: recovery action for huge page: Failed
>>>>   ...  !!! MADV_HWPOISON failed: Device or resource busy
>>>>   ... HugeTLB read HWPOISON test...TEST_FAILED
>>>>
>>>> I'm testing on v6.7-rc8. Not sure if this was working previously or not.
>>>
>>> Thanks for reporting this, Usama!
>>>
>>> I am also able to repro MADV_HWPOISON failure at "501a06fe8e4c
>>> (akpm/mm-stable, mm-stable) zswap: memcontrol: implement zswap
>>> writeback disabling."
>>>
>>> Then I checked out the earliest commit "ba91e7e5d15a (HEAD -> Base)
>>> selftests/mm: add tests for HWPOISON hugetlbfs read". The
>>> MADV_HWPOISON injection works and and the test passes:
>>>
>>>   ... HugeTLB read HWPOISON test...
>>>   ...  ... expect to read 0x101000 bytes of data in total
>>>   ...  !!! read failed: Input/output error
>>>   ...  ... actually read 0x101000 bytes of data in total
>>>   ... HugeTLB read HWPOISON test...TEST_PASSED
>>>   ... HugeTLB seek then read HWPOISON test...
>>>   ...  ... init val=4 with offset=0x102000
>>>   ...  ... expect to read 0xfe000 bytes of data in total
>>>   ...  ... actually read 0xfe000 bytes of data in total
>>>   ... HugeTLB seek then read HWPOISON test...TEST_PASSED
>>>   ...
>>>
>>> [ 2109.209225] Injecting memory failure for pfn 0x3190d01 at process
>>> virtual address 0x7f75e3101000
>>> [ 2109.209438] Memory failure: 0x3190d01: recovery action for huge
>>> page: Recovered
>>> ...
>>>
>>> I think something in between broken MADV_HWPOISON on hugetlbfs, and we
>>> should be able to figure it out via bisection (and of course by
>>> reading delta commits between them, probably related to page
>>> refcount).
>> Thank you for this information.
>>
>>>
>>> That being said, I will be on vacation from tomorrow until the end of
>>> next week. So I will get back to this after next weekend. Meanwhile if
>>> you want to go ahead and bisect the problematic commit, that will be
>>> very much appreciated.
>> I'll try to bisect and post here if I find something.
> Found the culprit commit by bisection:
> 
> a08c7193e4f18dc8508f2d07d0de2c5b94cb39a3
> mm/filemap: remove hugetlb special casing in filemap.c
> 
> hugetlb-read-hwpoison started failing from this patch. I've added the
> author of this patch to this bug report.
> 
Hi Usama,

Thanks for pointing this out. After debugging, the below diff seems to fix the 
issue and allows the tests to pass again. Could you test it on your 
configuration as well just to confirm.

Thanks,
Sidhartha

diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
index 36132c9125f9..3a248e4f7e93 100644
--- a/fs/hugetlbfs/inode.c
+++ b/fs/hugetlbfs/inode.c
@@ -340,7 +340,7 @@ static ssize_t hugetlbfs_read_iter(struct kiocb *iocb, 
struct iov_iter *to)
                 } else {
                         folio_unlock(folio);

-                       if (!folio_test_has_hwpoisoned(folio))
+                       if (!folio_test_hwpoison(folio))
                                 want = nr;
                         else {
                                 /*
diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index d8c853b35dbb..87f6bf7d8bc1 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -973,7 +973,7 @@ struct page_state {
  static bool has_extra_refcount(struct page_state *ps, struct page *p,
                                bool extra_pins)
  {
-       int count = page_count(p) - 1;
+       int count = page_count(p) - folio_nr_pages(page_folio(p));

         if (extra_pins)
                 count -= 1;


>>
>>>
>>> Thanks,
>>> Jiaqi
>>>
>>>
>>>>
>>>> Regards,
>>>> Usama
>>>>



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v4 4/4] selftests/mm: add tests for HWPOISON hugetlbfs read
  2024-01-11  2:32           ` Sidhartha Kumar
@ 2024-01-11  8:48             ` Muhammad Usama Anjum
  2024-01-11 17:34               ` Jiaqi Yan
  0 siblings, 1 reply; 19+ messages in thread
From: Muhammad Usama Anjum @ 2024-01-11  8:48 UTC (permalink / raw)
  To: Sidhartha Kumar, Jiaqi Yan
  Cc: Muhammad Usama Anjum, linmiaohe, mike.kravetz, naoya.horiguchi,
	akpm, songmuchun, shy828301, linux-mm, linux-kernel, jthoughton,
	kernel, Matthew Wilcox (Oracle)

On 1/11/24 7:32 AM, Sidhartha Kumar wrote:
> On 1/10/24 2:15 AM, Muhammad Usama Anjum wrote:
>> On 1/10/24 11:49 AM, Muhammad Usama Anjum wrote:
>>> On 1/6/24 2:13 AM, Jiaqi Yan wrote:
>>>> On Thu, Jan 4, 2024 at 10:27 PM Muhammad Usama Anjum
>>>> <usama.anjum@collabora.com> wrote:
>>>>>
>>>>> Hi,
>>>>>
>>>>> I'm trying to convert this test to TAP as I think the failures
>>>>> sometimes go
>>>>> unnoticed on CI systems if we only depend on the return value of the
>>>>> application. I've enabled the following configurations which aren't
>>>>> already
>>>>> present in tools/testing/selftests/mm/config:
>>>>> CONFIG_MEMORY_FAILURE=y
>>>>> CONFIG_HWPOISON_INJECT=m
>>>>>
>>>>> I'll send a patch to add these configs later. Right now I'm trying to
>>>>> investigate the failure when we are trying to inject the poison page by
>>>>> madvise(MADV_HWPOISON). I'm getting device busy every single time. The
>>>>> test
>>>>> fails as it doesn't expect any business for the hugetlb memory. I'm not
>>>>> sure if the poison handling code has issues or test isn't robust enough.
>>>>>
>>>>> ./hugetlb-read-hwpoison
>>>>> Write/read chunk size=0x800
>>>>>   ... HugeTLB read regression test...
>>>>>   ...  ... expect to read 0x200000 bytes of data in total
>>>>>   ...  ... actually read 0x200000 bytes of data in total
>>>>>   ... HugeTLB read regression test...TEST_PASSED
>>>>>   ... HugeTLB read HWPOISON test...
>>>>> [    9.280854] Injecting memory failure for pfn 0x102f01 at process
>>>>> virtual
>>>>> address 0x7f28ec101000
>>>>> [    9.282029] Memory failure: 0x102f01: huge page still referenced by
>>>>> 511
>>>>> users
>>>>> [    9.282987] Memory failure: 0x102f01: recovery action for huge
>>>>> page: Failed
>>>>>   ...  !!! MADV_HWPOISON failed: Device or resource busy
>>>>>   ... HugeTLB read HWPOISON test...TEST_FAILED
>>>>>
>>>>> I'm testing on v6.7-rc8. Not sure if this was working previously or not.
>>>>
>>>> Thanks for reporting this, Usama!
>>>>
>>>> I am also able to repro MADV_HWPOISON failure at "501a06fe8e4c
>>>> (akpm/mm-stable, mm-stable) zswap: memcontrol: implement zswap
>>>> writeback disabling."
>>>>
>>>> Then I checked out the earliest commit "ba91e7e5d15a (HEAD -> Base)
>>>> selftests/mm: add tests for HWPOISON hugetlbfs read". The
>>>> MADV_HWPOISON injection works and and the test passes:
>>>>
>>>>   ... HugeTLB read HWPOISON test...
>>>>   ...  ... expect to read 0x101000 bytes of data in total
>>>>   ...  !!! read failed: Input/output error
>>>>   ...  ... actually read 0x101000 bytes of data in total
>>>>   ... HugeTLB read HWPOISON test...TEST_PASSED
>>>>   ... HugeTLB seek then read HWPOISON test...
>>>>   ...  ... init val=4 with offset=0x102000
>>>>   ...  ... expect to read 0xfe000 bytes of data in total
>>>>   ...  ... actually read 0xfe000 bytes of data in total
>>>>   ... HugeTLB seek then read HWPOISON test...TEST_PASSED
>>>>   ...
>>>>
>>>> [ 2109.209225] Injecting memory failure for pfn 0x3190d01 at process
>>>> virtual address 0x7f75e3101000
>>>> [ 2109.209438] Memory failure: 0x3190d01: recovery action for huge
>>>> page: Recovered
>>>> ...
>>>>
>>>> I think something in between broken MADV_HWPOISON on hugetlbfs, and we
>>>> should be able to figure it out via bisection (and of course by
>>>> reading delta commits between them, probably related to page
>>>> refcount).
>>> Thank you for this information.
>>>
>>>>
>>>> That being said, I will be on vacation from tomorrow until the end of
>>>> next week. So I will get back to this after next weekend. Meanwhile if
>>>> you want to go ahead and bisect the problematic commit, that will be
>>>> very much appreciated.
>>> I'll try to bisect and post here if I find something.
>> Found the culprit commit by bisection:
>>
>> a08c7193e4f18dc8508f2d07d0de2c5b94cb39a3
>> mm/filemap: remove hugetlb special casing in filemap.c
>>
>> hugetlb-read-hwpoison started failing from this patch. I've added the
>> author of this patch to this bug report.
>>
> Hi Usama,
> 
> Thanks for pointing this out. After debugging, the below diff seems to fix
> the issue and allows the tests to pass again. Could you test it on your
> configuration as well just to confirm.
> 
> Thanks,
> Sidhartha
> 
> diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
> index 36132c9125f9..3a248e4f7e93 100644
> --- a/fs/hugetlbfs/inode.c
> +++ b/fs/hugetlbfs/inode.c
> @@ -340,7 +340,7 @@ static ssize_t hugetlbfs_read_iter(struct kiocb *iocb,
> struct iov_iter *to)
>                 } else {
>                         folio_unlock(folio);
> 
> -                       if (!folio_test_has_hwpoisoned(folio))
> +                       if (!folio_test_hwpoison(folio))
>                                 want = nr;
>                         else {
>                                 /*
> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
> index d8c853b35dbb..87f6bf7d8bc1 100644
> --- a/mm/memory-failure.c
> +++ b/mm/memory-failure.c
> @@ -973,7 +973,7 @@ struct page_state {
>  static bool has_extra_refcount(struct page_state *ps, struct page *p,
>                                bool extra_pins)
>  {
> -       int count = page_count(p) - 1;
> +       int count = page_count(p) - folio_nr_pages(page_folio(p));
> 
>         if (extra_pins)
>                 count -= 1;
> 
Tested the patch, it fixes the test. Please send this patch.

Tested-by: Muhammad Usama Anjum <usama.anjum@collabora.com>

-- 
BR,
Muhammad Usama Anjum


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v4 4/4] selftests/mm: add tests for HWPOISON hugetlbfs read
  2024-01-11  8:48             ` Muhammad Usama Anjum
@ 2024-01-11 17:34               ` Jiaqi Yan
  2024-01-11 17:51                 ` Sidhartha Kumar
  0 siblings, 1 reply; 19+ messages in thread
From: Jiaqi Yan @ 2024-01-11 17:34 UTC (permalink / raw)
  To: Muhammad Usama Anjum
  Cc: Sidhartha Kumar, linmiaohe, mike.kravetz, naoya.horiguchi, akpm,
	songmuchun, shy828301, linux-mm, linux-kernel, jthoughton,
	kernel, Matthew Wilcox (Oracle)

On Thu, Jan 11, 2024 at 12:48 AM Muhammad Usama Anjum
<usama.anjum@collabora.com> wrote:
>
> On 1/11/24 7:32 AM, Sidhartha Kumar wrote:
> > On 1/10/24 2:15 AM, Muhammad Usama Anjum wrote:
> >> On 1/10/24 11:49 AM, Muhammad Usama Anjum wrote:
> >>> On 1/6/24 2:13 AM, Jiaqi Yan wrote:
> >>>> On Thu, Jan 4, 2024 at 10:27 PM Muhammad Usama Anjum
> >>>> <usama.anjum@collabora.com> wrote:
> >>>>>
> >>>>> Hi,
> >>>>>
> >>>>> I'm trying to convert this test to TAP as I think the failures
> >>>>> sometimes go
> >>>>> unnoticed on CI systems if we only depend on the return value of the
> >>>>> application. I've enabled the following configurations which aren't
> >>>>> already
> >>>>> present in tools/testing/selftests/mm/config:
> >>>>> CONFIG_MEMORY_FAILURE=y
> >>>>> CONFIG_HWPOISON_INJECT=m
> >>>>>
> >>>>> I'll send a patch to add these configs later. Right now I'm trying to
> >>>>> investigate the failure when we are trying to inject the poison page by
> >>>>> madvise(MADV_HWPOISON). I'm getting device busy every single time. The
> >>>>> test
> >>>>> fails as it doesn't expect any business for the hugetlb memory. I'm not
> >>>>> sure if the poison handling code has issues or test isn't robust enough.
> >>>>>
> >>>>> ./hugetlb-read-hwpoison
> >>>>> Write/read chunk size=0x800
> >>>>>   ... HugeTLB read regression test...
> >>>>>   ...  ... expect to read 0x200000 bytes of data in total
> >>>>>   ...  ... actually read 0x200000 bytes of data in total
> >>>>>   ... HugeTLB read regression test...TEST_PASSED
> >>>>>   ... HugeTLB read HWPOISON test...
> >>>>> [    9.280854] Injecting memory failure for pfn 0x102f01 at process
> >>>>> virtual
> >>>>> address 0x7f28ec101000
> >>>>> [    9.282029] Memory failure: 0x102f01: huge page still referenced by
> >>>>> 511
> >>>>> users
> >>>>> [    9.282987] Memory failure: 0x102f01: recovery action for huge
> >>>>> page: Failed
> >>>>>   ...  !!! MADV_HWPOISON failed: Device or resource busy
> >>>>>   ... HugeTLB read HWPOISON test...TEST_FAILED
> >>>>>
> >>>>> I'm testing on v6.7-rc8. Not sure if this was working previously or not.
> >>>>
> >>>> Thanks for reporting this, Usama!
> >>>>
> >>>> I am also able to repro MADV_HWPOISON failure at "501a06fe8e4c
> >>>> (akpm/mm-stable, mm-stable) zswap: memcontrol: implement zswap
> >>>> writeback disabling."
> >>>>
> >>>> Then I checked out the earliest commit "ba91e7e5d15a (HEAD -> Base)
> >>>> selftests/mm: add tests for HWPOISON hugetlbfs read". The
> >>>> MADV_HWPOISON injection works and and the test passes:
> >>>>
> >>>>   ... HugeTLB read HWPOISON test...
> >>>>   ...  ... expect to read 0x101000 bytes of data in total
> >>>>   ...  !!! read failed: Input/output error
> >>>>   ...  ... actually read 0x101000 bytes of data in total
> >>>>   ... HugeTLB read HWPOISON test...TEST_PASSED
> >>>>   ... HugeTLB seek then read HWPOISON test...
> >>>>   ...  ... init val=4 with offset=0x102000
> >>>>   ...  ... expect to read 0xfe000 bytes of data in total
> >>>>   ...  ... actually read 0xfe000 bytes of data in total
> >>>>   ... HugeTLB seek then read HWPOISON test...TEST_PASSED
> >>>>   ...
> >>>>
> >>>> [ 2109.209225] Injecting memory failure for pfn 0x3190d01 at process
> >>>> virtual address 0x7f75e3101000
> >>>> [ 2109.209438] Memory failure: 0x3190d01: recovery action for huge
> >>>> page: Recovered
> >>>> ...
> >>>>
> >>>> I think something in between broken MADV_HWPOISON on hugetlbfs, and we
> >>>> should be able to figure it out via bisection (and of course by
> >>>> reading delta commits between them, probably related to page
> >>>> refcount).
> >>> Thank you for this information.
> >>>
> >>>>
> >>>> That being said, I will be on vacation from tomorrow until the end of
> >>>> next week. So I will get back to this after next weekend. Meanwhile if
> >>>> you want to go ahead and bisect the problematic commit, that will be
> >>>> very much appreciated.
> >>> I'll try to bisect and post here if I find something.
> >> Found the culprit commit by bisection:
> >>
> >> a08c7193e4f18dc8508f2d07d0de2c5b94cb39a3
> >> mm/filemap: remove hugetlb special casing in filemap.c

Thanks Usama!

> >>
> >> hugetlb-read-hwpoison started failing from this patch. I've added the
> >> author of this patch to this bug report.
> >>
> > Hi Usama,
> >
> > Thanks for pointing this out. After debugging, the below diff seems to fix
> > the issue and allows the tests to pass again. Could you test it on your
> > configuration as well just to confirm.
> >
> > Thanks,
> > Sidhartha
> >
> > diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
> > index 36132c9125f9..3a248e4f7e93 100644
> > --- a/fs/hugetlbfs/inode.c
> > +++ b/fs/hugetlbfs/inode.c
> > @@ -340,7 +340,7 @@ static ssize_t hugetlbfs_read_iter(struct kiocb *iocb,
> > struct iov_iter *to)
> >                 } else {
> >                         folio_unlock(folio);
> >
> > -                       if (!folio_test_has_hwpoisoned(folio))
> > +                       if (!folio_test_hwpoison(folio))

Sidhartha, just curious why this change is needed? Does
PageHasHWPoisoned change after commit
"a08c7193e4f18dc8508f2d07d0de2c5b94cb39a3"?

> >                                 want = nr;
> >                         else {
> >                                 /*
> > diff --git a/mm/memory-failure.c b/mm/memory-failure.c
> > index d8c853b35dbb..87f6bf7d8bc1 100644
> > --- a/mm/memory-failure.c
> > +++ b/mm/memory-failure.c
> > @@ -973,7 +973,7 @@ struct page_state {
> >  static bool has_extra_refcount(struct page_state *ps, struct page *p,
> >                                bool extra_pins)
> >  {
> > -       int count = page_count(p) - 1;
> > +       int count = page_count(p) - folio_nr_pages(page_folio(p));
> >
> >         if (extra_pins)
> >                 count -= 1;
> >
> Tested the patch, it fixes the test. Please send this patch.
>
> Tested-by: Muhammad Usama Anjum <usama.anjum@collabora.com>
>
> --
> BR,
> Muhammad Usama Anjum


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v4 4/4] selftests/mm: add tests for HWPOISON hugetlbfs read
  2024-01-11 17:34               ` Jiaqi Yan
@ 2024-01-11 17:51                 ` Sidhartha Kumar
  2024-01-11 18:03                   ` Matthew Wilcox
  0 siblings, 1 reply; 19+ messages in thread
From: Sidhartha Kumar @ 2024-01-11 17:51 UTC (permalink / raw)
  To: Jiaqi Yan, Muhammad Usama Anjum
  Cc: linmiaohe, mike.kravetz, naoya.horiguchi, akpm, songmuchun,
	shy828301, linux-mm, linux-kernel, jthoughton, kernel,
	Matthew Wilcox (Oracle)

On 1/11/24 9:34 AM, Jiaqi Yan wrote:
> On Thu, Jan 11, 2024 at 12:48 AM Muhammad Usama Anjum
> <usama.anjum@collabora.com> wrote:
>>
>> On 1/11/24 7:32 AM, Sidhartha Kumar wrote:
>>> On 1/10/24 2:15 AM, Muhammad Usama Anjum wrote:
>>>> On 1/10/24 11:49 AM, Muhammad Usama Anjum wrote:
>>>>> On 1/6/24 2:13 AM, Jiaqi Yan wrote:
>>>>>> On Thu, Jan 4, 2024 at 10:27 PM Muhammad Usama Anjum
>>>>>> <usama.anjum@collabora.com> wrote:
>>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> I'm trying to convert this test to TAP as I think the failures
>>>>>>> sometimes go
>>>>>>> unnoticed on CI systems if we only depend on the return value of the
>>>>>>> application. I've enabled the following configurations which aren't
>>>>>>> already
>>>>>>> present in tools/testing/selftests/mm/config:
>>>>>>> CONFIG_MEMORY_FAILURE=y
>>>>>>> CONFIG_HWPOISON_INJECT=m
>>>>>>>
>>>>>>> I'll send a patch to add these configs later. Right now I'm trying to
>>>>>>> investigate the failure when we are trying to inject the poison page by
>>>>>>> madvise(MADV_HWPOISON). I'm getting device busy every single time. The
>>>>>>> test
>>>>>>> fails as it doesn't expect any business for the hugetlb memory. I'm not
>>>>>>> sure if the poison handling code has issues or test isn't robust enough.
>>>>>>>
>>>>>>> ./hugetlb-read-hwpoison
>>>>>>> Write/read chunk size=0x800
>>>>>>>    ... HugeTLB read regression test...
>>>>>>>    ...  ... expect to read 0x200000 bytes of data in total
>>>>>>>    ...  ... actually read 0x200000 bytes of data in total
>>>>>>>    ... HugeTLB read regression test...TEST_PASSED
>>>>>>>    ... HugeTLB read HWPOISON test...
>>>>>>> [    9.280854] Injecting memory failure for pfn 0x102f01 at process
>>>>>>> virtual
>>>>>>> address 0x7f28ec101000
>>>>>>> [    9.282029] Memory failure: 0x102f01: huge page still referenced by
>>>>>>> 511
>>>>>>> users
>>>>>>> [    9.282987] Memory failure: 0x102f01: recovery action for huge
>>>>>>> page: Failed
>>>>>>>    ...  !!! MADV_HWPOISON failed: Device or resource busy
>>>>>>>    ... HugeTLB read HWPOISON test...TEST_FAILED
>>>>>>>
>>>>>>> I'm testing on v6.7-rc8. Not sure if this was working previously or not.
>>>>>>
>>>>>> Thanks for reporting this, Usama!
>>>>>>
>>>>>> I am also able to repro MADV_HWPOISON failure at "501a06fe8e4c
>>>>>> (akpm/mm-stable, mm-stable) zswap: memcontrol: implement zswap
>>>>>> writeback disabling."
>>>>>>
>>>>>> Then I checked out the earliest commit "ba91e7e5d15a (HEAD -> Base)
>>>>>> selftests/mm: add tests for HWPOISON hugetlbfs read". The
>>>>>> MADV_HWPOISON injection works and and the test passes:
>>>>>>
>>>>>>    ... HugeTLB read HWPOISON test...
>>>>>>    ...  ... expect to read 0x101000 bytes of data in total
>>>>>>    ...  !!! read failed: Input/output error
>>>>>>    ...  ... actually read 0x101000 bytes of data in total
>>>>>>    ... HugeTLB read HWPOISON test...TEST_PASSED
>>>>>>    ... HugeTLB seek then read HWPOISON test...
>>>>>>    ...  ... init val=4 with offset=0x102000
>>>>>>    ...  ... expect to read 0xfe000 bytes of data in total
>>>>>>    ...  ... actually read 0xfe000 bytes of data in total
>>>>>>    ... HugeTLB seek then read HWPOISON test...TEST_PASSED
>>>>>>    ...
>>>>>>
>>>>>> [ 2109.209225] Injecting memory failure for pfn 0x3190d01 at process
>>>>>> virtual address 0x7f75e3101000
>>>>>> [ 2109.209438] Memory failure: 0x3190d01: recovery action for huge
>>>>>> page: Recovered
>>>>>> ...
>>>>>>
>>>>>> I think something in between broken MADV_HWPOISON on hugetlbfs, and we
>>>>>> should be able to figure it out via bisection (and of course by
>>>>>> reading delta commits between them, probably related to page
>>>>>> refcount).
>>>>> Thank you for this information.
>>>>>
>>>>>>
>>>>>> That being said, I will be on vacation from tomorrow until the end of
>>>>>> next week. So I will get back to this after next weekend. Meanwhile if
>>>>>> you want to go ahead and bisect the problematic commit, that will be
>>>>>> very much appreciated.
>>>>> I'll try to bisect and post here if I find something.
>>>> Found the culprit commit by bisection:
>>>>
>>>> a08c7193e4f18dc8508f2d07d0de2c5b94cb39a3
>>>> mm/filemap: remove hugetlb special casing in filemap.c
> 
> Thanks Usama!
> 
>>>>
>>>> hugetlb-read-hwpoison started failing from this patch. I've added the
>>>> author of this patch to this bug report.
>>>>
>>> Hi Usama,
>>>
>>> Thanks for pointing this out. After debugging, the below diff seems to fix
>>> the issue and allows the tests to pass again. Could you test it on your
>>> configuration as well just to confirm.
>>>
>>> Thanks,
>>> Sidhartha
>>>
>>> diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
>>> index 36132c9125f9..3a248e4f7e93 100644
>>> --- a/fs/hugetlbfs/inode.c
>>> +++ b/fs/hugetlbfs/inode.c
>>> @@ -340,7 +340,7 @@ static ssize_t hugetlbfs_read_iter(struct kiocb *iocb,
>>> struct iov_iter *to)
>>>                  } else {
>>>                          folio_unlock(folio);
>>>
>>> -                       if (!folio_test_has_hwpoisoned(folio))
>>> +                       if (!folio_test_hwpoison(folio))
> 
> Sidhartha, just curious why this change is needed? Does
> PageHasHWPoisoned change after commit
> "a08c7193e4f18dc8508f2d07d0de2c5b94cb39a3"?
> 

No its not an issue PageHasHWPoisoned(), the original code is testing for the 
wrong flag and I realized that has_hwpoison and hwpoison are two different 
flags. The memory-failure code calls folio_test_set_hwpoison() to set the 
hwpoison flag and does not set the has_hwpoison flag. When debugging, I realized 
this if statement was never true despite the code hitting 
folio_test_set_hwpoison(). Now we are testing the correct flag.

 From page-flags.h

#ifdef CONFIG_MEMORY_FAILURE
	PG_hwpoison,		/* hardware poisoned page. Don't touch */
#endif

folio_test_hwpoison() checks this flag ^^^

/* At least one page in this folio has the hwpoison flag set */
PG_has_hwpoisoned = PG_error,

while folio_test_has_hwpoisoned() checks this flag ^^^


Thanks,
Sidhartha




>>>                                  want = nr;
>>>                          else {
>>>                                  /*
>>> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
>>> index d8c853b35dbb..87f6bf7d8bc1 100644
>>> --- a/mm/memory-failure.c
>>> +++ b/mm/memory-failure.c
>>> @@ -973,7 +973,7 @@ struct page_state {
>>>   static bool has_extra_refcount(struct page_state *ps, struct page *p,
>>>                                 bool extra_pins)
>>>   {
>>> -       int count = page_count(p) - 1;
>>> +       int count = page_count(p) - folio_nr_pages(page_folio(p));
>>>
>>>          if (extra_pins)
>>>                  count -= 1;
>>>
>> Tested the patch, it fixes the test. Please send this patch.
>>
>> Tested-by: Muhammad Usama Anjum <usama.anjum@collabora.com>
>>
>> --
>> BR,
>> Muhammad Usama Anjum



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v4 4/4] selftests/mm: add tests for HWPOISON hugetlbfs read
  2024-01-11 17:51                 ` Sidhartha Kumar
@ 2024-01-11 18:03                   ` Matthew Wilcox
  2024-01-11 18:11                     ` Sidhartha Kumar
  0 siblings, 1 reply; 19+ messages in thread
From: Matthew Wilcox @ 2024-01-11 18:03 UTC (permalink / raw)
  To: Sidhartha Kumar
  Cc: Jiaqi Yan, Muhammad Usama Anjum, linmiaohe, mike.kravetz,
	naoya.horiguchi, akpm, songmuchun, shy828301, linux-mm,
	linux-kernel, jthoughton, kernel

On Thu, Jan 11, 2024 at 09:51:47AM -0800, Sidhartha Kumar wrote:
> On 1/11/24 9:34 AM, Jiaqi Yan wrote:
> > > -                       if (!folio_test_has_hwpoisoned(folio))
> > > +                       if (!folio_test_hwpoison(folio))
> > 
> > Sidhartha, just curious why this change is needed? Does
> > PageHasHWPoisoned change after commit
> > "a08c7193e4f18dc8508f2d07d0de2c5b94cb39a3"?
> 
> No its not an issue PageHasHWPoisoned(), the original code is testing for
> the wrong flag and I realized that has_hwpoison and hwpoison are two
> different flags. The memory-failure code calls folio_test_set_hwpoison() to
> set the hwpoison flag and does not set the has_hwpoison flag. When
> debugging, I realized this if statement was never true despite the code
> hitting folio_test_set_hwpoison(). Now we are testing the correct flag.
> 
> From page-flags.h
> 
> #ifdef CONFIG_MEMORY_FAILURE
> 	PG_hwpoison,		/* hardware poisoned page. Don't touch */
> #endif
> 
> folio_test_hwpoison() checks this flag ^^^
> 
> /* At least one page in this folio has the hwpoison flag set */
> PG_has_hwpoisoned = PG_error,
> 
> while folio_test_has_hwpoisoned() checks this flag ^^^

So what you're saying is that hugetlb behaves differently from THP
with how memory-failure sets the flags?


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v4 4/4] selftests/mm: add tests for HWPOISON hugetlbfs read
  2024-01-11 18:03                   ` Matthew Wilcox
@ 2024-01-11 18:11                     ` Sidhartha Kumar
  2024-01-11 18:30                       ` Jiaqi Yan
  0 siblings, 1 reply; 19+ messages in thread
From: Sidhartha Kumar @ 2024-01-11 18:11 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Jiaqi Yan, Muhammad Usama Anjum, linmiaohe, mike.kravetz,
	naoya.horiguchi, akpm, songmuchun, shy828301, linux-mm,
	linux-kernel, jthoughton, kernel

On 1/11/24 10:03 AM, Matthew Wilcox wrote:
> On Thu, Jan 11, 2024 at 09:51:47AM -0800, Sidhartha Kumar wrote:
>> On 1/11/24 9:34 AM, Jiaqi Yan wrote:
>>>> -                       if (!folio_test_has_hwpoisoned(folio))
>>>> +                       if (!folio_test_hwpoison(folio))
>>>
>>> Sidhartha, just curious why this change is needed? Does
>>> PageHasHWPoisoned change after commit
>>> "a08c7193e4f18dc8508f2d07d0de2c5b94cb39a3"?
>>
>> No its not an issue PageHasHWPoisoned(), the original code is testing for
>> the wrong flag and I realized that has_hwpoison and hwpoison are two
>> different flags. The memory-failure code calls folio_test_set_hwpoison() to
>> set the hwpoison flag and does not set the has_hwpoison flag. When
>> debugging, I realized this if statement was never true despite the code
>> hitting folio_test_set_hwpoison(). Now we are testing the correct flag.
>>
>>  From page-flags.h
>>
>> #ifdef CONFIG_MEMORY_FAILURE
>> 	PG_hwpoison,		/* hardware poisoned page. Don't touch */
>> #endif
>>
>> folio_test_hwpoison() checks this flag ^^^
>>
>> /* At least one page in this folio has the hwpoison flag set */
>> PG_has_hwpoisoned = PG_error,
>>
>> while folio_test_has_hwpoisoned() checks this flag ^^^
> 
> So what you're saying is that hugetlb behaves differently from THP
> with how memory-failure sets the flags?

I think so, in memory_failure() THP goes through this path:
	
	hpage = compound_head(p);
	if (PageTransHuge(hpage)) {
		/*
		 * The flag must be set after the refcount is bumped
		 * otherwise it may race with THP split.
		 * And the flag can't be set in get_hwpoison_page() since
		 * it is called by soft offline too and it is just called
		 * for !MF_COUNT_INCREASED.  So here seems to be the best
		 * place.
		 *
		 * Don't need care about the above error handling paths for
		 * get_hwpoison_page() since they handle either free page
		 * or unhandlable page.  The refcount is bumped iff the
		 * page is a valid handlable page.
		 */
		SetPageHasHWPoisoned(hpage);

which sets has_hwpoisoned flag while hugetlb goes through 
folio_set_hugetlb_hwpoison() which calls folio_test_set_hwpoison().


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v4 4/4] selftests/mm: add tests for HWPOISON hugetlbfs read
  2024-01-11 18:11                     ` Sidhartha Kumar
@ 2024-01-11 18:30                       ` Jiaqi Yan
  2024-01-11 18:36                         ` Sidhartha Kumar
  0 siblings, 1 reply; 19+ messages in thread
From: Jiaqi Yan @ 2024-01-11 18:30 UTC (permalink / raw)
  To: Sidhartha Kumar
  Cc: Matthew Wilcox, Muhammad Usama Anjum, linmiaohe, mike.kravetz,
	naoya.horiguchi, akpm, songmuchun, shy828301, linux-mm,
	linux-kernel, jthoughton, kernel

On Thu, Jan 11, 2024 at 10:11 AM Sidhartha Kumar
<sidhartha.kumar@oracle.com> wrote:
>
> On 1/11/24 10:03 AM, Matthew Wilcox wrote:
> > On Thu, Jan 11, 2024 at 09:51:47AM -0800, Sidhartha Kumar wrote:
> >> On 1/11/24 9:34 AM, Jiaqi Yan wrote:
> >>>> -                       if (!folio_test_has_hwpoisoned(folio))
> >>>> +                       if (!folio_test_hwpoison(folio))
> >>>
> >>> Sidhartha, just curious why this change is needed? Does
> >>> PageHasHWPoisoned change after commit
> >>> "a08c7193e4f18dc8508f2d07d0de2c5b94cb39a3"?
> >>
> >> No its not an issue PageHasHWPoisoned(), the original code is testing for
> >> the wrong flag and I realized that has_hwpoison and hwpoison are two
> >> different flags. The memory-failure code calls folio_test_set_hwpoison() to
> >> set the hwpoison flag and does not set the has_hwpoison flag. When
> >> debugging, I realized this if statement was never true despite the code
> >> hitting folio_test_set_hwpoison(). Now we are testing the correct flag.
> >>
> >>  From page-flags.h
> >>
> >> #ifdef CONFIG_MEMORY_FAILURE
> >>      PG_hwpoison,            /* hardware poisoned page. Don't touch */
> >> #endif
> >>
> >> folio_test_hwpoison() checks this flag ^^^
> >>
> >> /* At least one page in this folio has the hwpoison flag set */
> >> PG_has_hwpoisoned = PG_error,
> >>
> >> while folio_test_has_hwpoisoned() checks this flag ^^^
> >
> > So what you're saying is that hugetlb behaves differently from THP
> > with how memory-failure sets the flags?
>
> I think so, in memory_failure() THP goes through this path:
>
>         hpage = compound_head(p);
>         if (PageTransHuge(hpage)) {
>                 /*
>                  * The flag must be set after the refcount is bumped
>                  * otherwise it may race with THP split.
>                  * And the flag can't be set in get_hwpoison_page() since
>                  * it is called by soft offline too and it is just called
>                  * for !MF_COUNT_INCREASED.  So here seems to be the best
>                  * place.
>                  *
>                  * Don't need care about the above error handling paths for
>                  * get_hwpoison_page() since they handle either free page
>                  * or unhandlable page.  The refcount is bumped iff the
>                  * page is a valid handlable page.
>                  */
>                 SetPageHasHWPoisoned(hpage);
>
> which sets has_hwpoisoned flag while hugetlb goes through
> folio_set_hugetlb_hwpoison() which calls folio_test_set_hwpoison().

Yes, hugetlb sets HWPoison flag as the whole hugepage is poisoned once
a raw page is poisoned. It can't split to make other supages still
available as THP. This "Improve hugetlbfs read on HWPOISON hugepages"
patchset only improves fs case as splitting is not needed.

I found commit a08c7193e4f18 ("mm/filemap: remove hugetlb special
casing in filemap.c") has the following changes in inode.c:

--- a/fs/hugetlbfs/inode.c
+++ b/fs/hugetlbfs/inode.c
@@ -334,7 +334,7 @@ static ssize_t hugetlbfs_read_iter(struct kiocb
*iocb, struct iov_iter *to)
        ssize_t retval = 0;

        while (iov_iter_count(to)) {
-               struct page *page;
+               struct folio *folio;
                size_t nr, copied, want;

                /* nr is the maximum number of bytes to copy from this page */
@@ -352,18 +352,18 @@ static ssize_t hugetlbfs_read_iter(struct kiocb
*iocb, struct iov_iter *to)
                }
                nr = nr - offset;


                /* nr is the maximum number of bytes to copy from this page */
@@ -352,18 +352,18 @@ static ssize_t hugetlbfs_read_iter(struct kiocb
*iocb, struct iov_iter *to)
                }
                nr = nr - offset;

-               /* Find the page */
-               page = find_lock_page(mapping, index);
-               if (unlikely(page == NULL)) {
+               /* Find the folio */
+               folio = filemap_lock_hugetlb_folio(h, mapping, index);
+               if (IS_ERR(folio)) {
                        /*
                         * We have a HOLE, zero out the user-buffer for the
                         * length of the hole or request.
                         */
                        copied = iov_iter_zero(nr, to);
                } else {
-                       unlock_page(page);
+                       folio_unlock(folio);

-                       if (!PageHWPoison(page))
+                       if (!folio_test_has_hwpoisoned(folio))
                                want = nr;

So I guess this "PageHWPoison => folio_test_has_hwpoisoned" change is
another regression aside from the refcount thing?


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v4 4/4] selftests/mm: add tests for HWPOISON hugetlbfs read
  2024-01-11 18:30                       ` Jiaqi Yan
@ 2024-01-11 18:36                         ` Sidhartha Kumar
  0 siblings, 0 replies; 19+ messages in thread
From: Sidhartha Kumar @ 2024-01-11 18:36 UTC (permalink / raw)
  To: Jiaqi Yan
  Cc: Matthew Wilcox, Muhammad Usama Anjum, linmiaohe, mike.kravetz,
	naoya.horiguchi, akpm, songmuchun, shy828301, linux-mm,
	linux-kernel, jthoughton, kernel

On 1/11/24 10:30 AM, Jiaqi Yan wrote:
> On Thu, Jan 11, 2024 at 10:11 AM Sidhartha Kumar
> <sidhartha.kumar@oracle.com> wrote:
>>
>> On 1/11/24 10:03 AM, Matthew Wilcox wrote:
>>> On Thu, Jan 11, 2024 at 09:51:47AM -0800, Sidhartha Kumar wrote:
>>>> On 1/11/24 9:34 AM, Jiaqi Yan wrote:
>>>>>> -                       if (!folio_test_has_hwpoisoned(folio))
>>>>>> +                       if (!folio_test_hwpoison(folio))
>>>>>
>>>>> Sidhartha, just curious why this change is needed? Does
>>>>> PageHasHWPoisoned change after commit
>>>>> "a08c7193e4f18dc8508f2d07d0de2c5b94cb39a3"?
>>>>
>>>> No its not an issue PageHasHWPoisoned(), the original code is testing for
>>>> the wrong flag and I realized that has_hwpoison and hwpoison are two
>>>> different flags. The memory-failure code calls folio_test_set_hwpoison() to
>>>> set the hwpoison flag and does not set the has_hwpoison flag. When
>>>> debugging, I realized this if statement was never true despite the code
>>>> hitting folio_test_set_hwpoison(). Now we are testing the correct flag.
>>>>
>>>>   From page-flags.h
>>>>
>>>> #ifdef CONFIG_MEMORY_FAILURE
>>>>       PG_hwpoison,            /* hardware poisoned page. Don't touch */
>>>> #endif
>>>>
>>>> folio_test_hwpoison() checks this flag ^^^
>>>>
>>>> /* At least one page in this folio has the hwpoison flag set */
>>>> PG_has_hwpoisoned = PG_error,
>>>>
>>>> while folio_test_has_hwpoisoned() checks this flag ^^^
>>>
>>> So what you're saying is that hugetlb behaves differently from THP
>>> with how memory-failure sets the flags?
>>
>> I think so, in memory_failure() THP goes through this path:
>>
>>          hpage = compound_head(p);
>>          if (PageTransHuge(hpage)) {
>>                  /*
>>                   * The flag must be set after the refcount is bumped
>>                   * otherwise it may race with THP split.
>>                   * And the flag can't be set in get_hwpoison_page() since
>>                   * it is called by soft offline too and it is just called
>>                   * for !MF_COUNT_INCREASED.  So here seems to be the best
>>                   * place.
>>                   *
>>                   * Don't need care about the above error handling paths for
>>                   * get_hwpoison_page() since they handle either free page
>>                   * or unhandlable page.  The refcount is bumped iff the
>>                   * page is a valid handlable page.
>>                   */
>>                  SetPageHasHWPoisoned(hpage);
>>
>> which sets has_hwpoisoned flag while hugetlb goes through
>> folio_set_hugetlb_hwpoison() which calls folio_test_set_hwpoison().
> 
> Yes, hugetlb sets HWPoison flag as the whole hugepage is poisoned once
> a raw page is poisoned. It can't split to make other supages still
> available as THP. This "Improve hugetlbfs read on HWPOISON hugepages"
> patchset only improves fs case as splitting is not needed.
> 
> I found commit a08c7193e4f18 ("mm/filemap: remove hugetlb special
> casing in filemap.c") has the following changes in inode.c:
> 
> --- a/fs/hugetlbfs/inode.c
> +++ b/fs/hugetlbfs/inode.c
> @@ -334,7 +334,7 @@ static ssize_t hugetlbfs_read_iter(struct kiocb
> *iocb, struct iov_iter *to)
>          ssize_t retval = 0;
> 
>          while (iov_iter_count(to)) {
> -               struct page *page;
> +               struct folio *folio;
>                  size_t nr, copied, want;
> 
>                  /* nr is the maximum number of bytes to copy from this page */
> @@ -352,18 +352,18 @@ static ssize_t hugetlbfs_read_iter(struct kiocb
> *iocb, struct iov_iter *to)
>                  }
>                  nr = nr - offset;
> 
> 
>                  /* nr is the maximum number of bytes to copy from this page */
> @@ -352,18 +352,18 @@ static ssize_t hugetlbfs_read_iter(struct kiocb
> *iocb, struct iov_iter *to)
>                  }
>                  nr = nr - offset;
> 
> -               /* Find the page */
> -               page = find_lock_page(mapping, index);
> -               if (unlikely(page == NULL)) {
> +               /* Find the folio */
> +               folio = filemap_lock_hugetlb_folio(h, mapping, index);
> +               if (IS_ERR(folio)) {
>                          /*
>                           * We have a HOLE, zero out the user-buffer for the
>                           * length of the hole or request.
>                           */
>                          copied = iov_iter_zero(nr, to);
>                  } else {
> -                       unlock_page(page);
> +                       folio_unlock(folio);
> 
> -                       if (!PageHWPoison(page))
> +                       if (!folio_test_has_hwpoisoned(folio))
>                                  want = nr;
> 
> So I guess this "PageHWPoison => folio_test_has_hwpoisoned" change is
> another regression aside from the refcount thing?

ya this is another error. The refcount change fixes the madvise() call in the 
tests but the poison read tests still failed. The change to 
folio_test_hwpoison() fixes the poison read tests after the madvise() call 
succeeds.


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v4 4/4] selftests/mm: add tests for HWPOISON hugetlbfs read
  2024-01-10 10:15         ` Muhammad Usama Anjum
  2024-01-11  2:32           ` Sidhartha Kumar
@ 2024-01-12  6:16           ` Muhammad Usama Anjum
  2024-01-19 10:10             ` Linux regression tracking #update (Thorsten Leemhuis)
  1 sibling, 1 reply; 19+ messages in thread
From: Muhammad Usama Anjum @ 2024-01-12  6:16 UTC (permalink / raw)
  To: Jiaqi Yan, Sidhartha Kumar
  Cc: Muhammad Usama Anjum, linmiaohe, mike.kravetz, naoya.horiguchi,
	akpm, songmuchun, shy828301, linux-mm, linux-kernel, jthoughton,
	kernel, Matthew Wilcox (Oracle),
	Linux Regressions

On 1/10/24 3:15 PM, Muhammad Usama Anjum wrote:
> On 1/10/24 11:49 AM, Muhammad Usama Anjum wrote:
>> On 1/6/24 2:13 AM, Jiaqi Yan wrote:
>>> On Thu, Jan 4, 2024 at 10:27 PM Muhammad Usama Anjum
>>> <usama.anjum@collabora.com> wrote:
>>>>
>>>> Hi,
>>>>
>>>> I'm trying to convert this test to TAP as I think the failures sometimes go
>>>> unnoticed on CI systems if we only depend on the return value of the
>>>> application. I've enabled the following configurations which aren't already
>>>> present in tools/testing/selftests/mm/config:
>>>> CONFIG_MEMORY_FAILURE=y
>>>> CONFIG_HWPOISON_INJECT=m
>>>>
>>>> I'll send a patch to add these configs later. Right now I'm trying to
>>>> investigate the failure when we are trying to inject the poison page by
>>>> madvise(MADV_HWPOISON). I'm getting device busy every single time. The test
>>>> fails as it doesn't expect any business for the hugetlb memory. I'm not
>>>> sure if the poison handling code has issues or test isn't robust enough.
>>>>
>>>> ./hugetlb-read-hwpoison
>>>> Write/read chunk size=0x800
>>>>  ... HugeTLB read regression test...
>>>>  ...  ... expect to read 0x200000 bytes of data in total
>>>>  ...  ... actually read 0x200000 bytes of data in total
>>>>  ... HugeTLB read regression test...TEST_PASSED
>>>>  ... HugeTLB read HWPOISON test...
>>>> [    9.280854] Injecting memory failure for pfn 0x102f01 at process virtual
>>>> address 0x7f28ec101000
>>>> [    9.282029] Memory failure: 0x102f01: huge page still referenced by 511
>>>> users
>>>> [    9.282987] Memory failure: 0x102f01: recovery action for huge page: Failed
>>>>  ...  !!! MADV_HWPOISON failed: Device or resource busy
>>>>  ... HugeTLB read HWPOISON test...TEST_FAILED
>>>>
>>>> I'm testing on v6.7-rc8. Not sure if this was working previously or not.
>>>
>>> Thanks for reporting this, Usama!
>>>
>>> I am also able to repro MADV_HWPOISON failure at "501a06fe8e4c
>>> (akpm/mm-stable, mm-stable) zswap: memcontrol: implement zswap
>>> writeback disabling."
>>>
>>> Then I checked out the earliest commit "ba91e7e5d15a (HEAD -> Base)
>>> selftests/mm: add tests for HWPOISON hugetlbfs read". The
>>> MADV_HWPOISON injection works and and the test passes:
>>>
>>>  ... HugeTLB read HWPOISON test...
>>>  ...  ... expect to read 0x101000 bytes of data in total
>>>  ...  !!! read failed: Input/output error
>>>  ...  ... actually read 0x101000 bytes of data in total
>>>  ... HugeTLB read HWPOISON test...TEST_PASSED
>>>  ... HugeTLB seek then read HWPOISON test...
>>>  ...  ... init val=4 with offset=0x102000
>>>  ...  ... expect to read 0xfe000 bytes of data in total
>>>  ...  ... actually read 0xfe000 bytes of data in total
>>>  ... HugeTLB seek then read HWPOISON test...TEST_PASSED
>>>  ...
>>>
>>> [ 2109.209225] Injecting memory failure for pfn 0x3190d01 at process
>>> virtual address 0x7f75e3101000
>>> [ 2109.209438] Memory failure: 0x3190d01: recovery action for huge
>>> page: Recovered
>>> ...
>>>
>>> I think something in between broken MADV_HWPOISON on hugetlbfs, and we
>>> should be able to figure it out via bisection (and of course by
>>> reading delta commits between them, probably related to page
>>> refcount).
>> Thank you for this information.
>>
>>>
>>> That being said, I will be on vacation from tomorrow until the end of
>>> next week. So I will get back to this after next weekend. Meanwhile if
>>> you want to go ahead and bisect the problematic commit, that will be
>>> very much appreciated.
>> I'll try to bisect and post here if I find something.
> Found the culprit commit by bisection:
> 
> a08c7193e4f18dc8508f2d07d0de2c5b94cb39a3
> mm/filemap: remove hugetlb special casing in filemap.c

#regzbot title: hugetlbfs hwpoison handling
#regzbot introduced: a08c7193e4f1
#regzbot monitor:
https://lore.kernel.org/all/20240111191655.295530-1-sidhartha.kumar@oracle.com

> 
> hugetlb-read-hwpoison started failing from this patch. I've added the
> author of this patch to this bug report.
> 
>>
>>>
>>> Thanks,
>>> Jiaqi
>>>
>>>
>>>>
>>>> Regards,
>>>> Usama
>>>>
> 

-- 
BR,
Muhammad Usama Anjum


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v4 4/4] selftests/mm: add tests for HWPOISON hugetlbfs read
  2024-01-12  6:16           ` Muhammad Usama Anjum
@ 2024-01-19 10:10             ` Linux regression tracking #update (Thorsten Leemhuis)
  0 siblings, 0 replies; 19+ messages in thread
From: Linux regression tracking #update (Thorsten Leemhuis) @ 2024-01-19 10:10 UTC (permalink / raw)
  To: Linux Regressions; +Cc: linux-mm, linux-kernel

On 12.01.24 07:16, Muhammad Usama Anjum wrote:
> On 1/10/24 3:15 PM, Muhammad Usama Anjum wrote:
>> On 1/10/24 11:49 AM, Muhammad Usama Anjum wrote:
>>> On 1/6/24 2:13 AM, Jiaqi Yan wrote:
>>>> On Thu, Jan 4, 2024 at 10:27 PM Muhammad Usama Anjum
>>>> <usama.anjum@collabora.com> wrote:
>>>>>
>>>>> I'm trying to convert this test to TAP as I think the failures sometimes go
>>>>> unnoticed on CI systems if we only depend on the return value of the
>>>>> application. I've enabled the following configurations which aren't already
>>>>> present in tools/testing/selftests/mm/config:
>>>>> CONFIG_MEMORY_FAILURE=y
>>>>> CONFIG_HWPOISON_INJECT=m

> #regzbot title: hugetlbfs hwpoison handling
> #regzbot introduced: a08c7193e4f1
> #regzbot monitor:
> https://lore.kernel.org/all/20240111191655.295530-1-sidhartha.kumar@oracle.com

#regzbot fix: fs/hugetlbfs/inode.c: mm/memory-failure.c: fix hugetlbfs
hwpoison handling
#regzbot ignore-activity

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
That page also explains what to do if mails like this annoy you.



^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2024-01-19 10:11 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-07-13  0:18 [PATCH v4 0/4] Improve hugetlbfs read on HWPOISON hugepages Jiaqi Yan
2023-07-13  0:18 ` [PATCH v4 1/4] mm/hwpoison: delete all entries before traversal in __folio_free_raw_hwp Jiaqi Yan
2023-07-13  0:18 ` [PATCH v4 2/4] mm/hwpoison: check if a raw page in a hugetlb folio is raw HWPOISON Jiaqi Yan
2023-07-13  0:18 ` [PATCH v4 3/4] hugetlbfs: improve read HWPOISON hugepage Jiaqi Yan
2023-07-13  0:18 ` [PATCH v4 4/4] selftests/mm: add tests for HWPOISON hugetlbfs read Jiaqi Yan
2024-01-05  6:27   ` Muhammad Usama Anjum
2024-01-05 21:13     ` Jiaqi Yan
2024-01-10  6:49       ` Muhammad Usama Anjum
2024-01-10 10:15         ` Muhammad Usama Anjum
2024-01-11  2:32           ` Sidhartha Kumar
2024-01-11  8:48             ` Muhammad Usama Anjum
2024-01-11 17:34               ` Jiaqi Yan
2024-01-11 17:51                 ` Sidhartha Kumar
2024-01-11 18:03                   ` Matthew Wilcox
2024-01-11 18:11                     ` Sidhartha Kumar
2024-01-11 18:30                       ` Jiaqi Yan
2024-01-11 18:36                         ` Sidhartha Kumar
2024-01-12  6:16           ` Muhammad Usama Anjum
2024-01-19 10:10             ` Linux regression tracking #update (Thorsten Leemhuis)

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox