* [PATCH v3 1/4] mm/hwpoison: delete all entries before traversal in __folio_free_raw_hwp
2023-07-07 20:19 [PATCH v3 0/4] Improve hugetlbfs read on HWPOISON hugepages Jiaqi Yan
@ 2023-07-07 20:19 ` Jiaqi Yan
2023-07-08 2:40 ` Miaohe Lin
2023-07-07 20:19 ` [PATCH v3 2/4] mm/hwpoison: check if a subpage of a hugetlb folio is raw HWPOISON Jiaqi Yan
` (2 subsequent siblings)
3 siblings, 1 reply; 17+ messages in thread
From: Jiaqi Yan @ 2023-07-07 20:19 UTC (permalink / raw)
To: akpm, mike.kravetz, naoya.horiguchi
Cc: songmuchun, shy828301, linmiaohe, linux-mm, linux-kernel,
duenwen, axelrasmussen, jthoughton, Jiaqi Yan
Traversal on llist (e.g. llist_for_each_safe) is only safe AFTER entries
are deleted from the llist. Correct the way __folio_free_raw_hwp deletes
and frees raw_hwp_page entries in raw_hwp_list: first llist_del_all, then
kfree within llist_for_each_safe.
As of today, concurrent adding, deleting, and traversal on raw_hwp_list
from hugetlb.c and/or memory-failure.c are fine with each other. Note
this is guaranteed partly by the lock-free nature of llist, and partly
by holding hugetlb_lock and/or mf_mutex. For example, as llist_del_all
is lock-free with itself, folio_clear_hugetlb_hwpoison()s from
__update_and_free_hugetlb_folio and memory_failure won't need explicit
locking when freeing the raw_hwp_list. New code that manipulates
raw_hwp_list must be careful to ensure the concurrency correctness.
Acked-by: Mike Kravetz <mike.kravetz@oracle.com>
Acked-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
Signed-off-by: Jiaqi Yan <jiaqiyan@google.com>
---
mm/memory-failure.c | 8 +++-----
1 file changed, 3 insertions(+), 5 deletions(-)
diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index e245191e6b04..a08677dcf953 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -1829,12 +1829,11 @@ static inline struct llist_head *raw_hwp_list_head(struct folio *folio)
static unsigned long __folio_free_raw_hwp(struct folio *folio, bool move_flag)
{
- struct llist_head *head;
- struct llist_node *t, *tnode;
+ struct llist_node *t, *tnode, *head;
unsigned long count = 0;
- head = raw_hwp_list_head(folio);
- llist_for_each_safe(tnode, t, head->first) {
+ head = llist_del_all(raw_hwp_list_head(folio));
+ llist_for_each_safe(tnode, t, head) {
struct raw_hwp_page *p = container_of(tnode, struct raw_hwp_page, node);
if (move_flag)
@@ -1844,7 +1843,6 @@ static unsigned long __folio_free_raw_hwp(struct folio *folio, bool move_flag)
kfree(p);
count++;
}
- llist_del_all(head);
return count;
}
--
2.41.0.255.g8b1d071c50-goog
^ permalink raw reply [flat|nested] 17+ messages in thread* Re: [PATCH v3 1/4] mm/hwpoison: delete all entries before traversal in __folio_free_raw_hwp
2023-07-07 20:19 ` [PATCH v3 1/4] mm/hwpoison: delete all entries before traversal in __folio_free_raw_hwp Jiaqi Yan
@ 2023-07-08 2:40 ` Miaohe Lin
0 siblings, 0 replies; 17+ messages in thread
From: Miaohe Lin @ 2023-07-08 2:40 UTC (permalink / raw)
To: Jiaqi Yan, mike.kravetz, naoya.horiguchi, Andrew Morton
Cc: songmuchun, shy828301, linux-mm, linux-kernel, duenwen,
axelrasmussen, jthoughton
On 2023/7/8 4:19, Jiaqi Yan wrote:
> Traversal on llist (e.g. llist_for_each_safe) is only safe AFTER entries
> are deleted from the llist. Correct the way __folio_free_raw_hwp deletes
> and frees raw_hwp_page entries in raw_hwp_list: first llist_del_all, then
> kfree within llist_for_each_safe.
>
> As of today, concurrent adding, deleting, and traversal on raw_hwp_list
> from hugetlb.c and/or memory-failure.c are fine with each other. Note
I think there's a race on freeing the raw_hwp_list between unpoison_memory and __update_and_free_hugetlb_folio:
unpoison_memory __update_and_free_hugetlb_folio
if (folio_test_hwpoison)
folio_clear_hugetlb_hwpoison
folio_free_raw_hwp folio_free_raw_hwp
folio_test_clear_hwpoison
unpoison_memory and __update_and_free_hugetlb_folio can traverse and free the raw_hwp_list
at the same time. And I believe your patch will fix the problem. Thanks.
> this is guaranteed partly by the lock-free nature of llist, and partly
> by holding hugetlb_lock and/or mf_mutex. For example, as llist_del_all
> is lock-free with itself, folio_clear_hugetlb_hwpoison()s from
> __update_and_free_hugetlb_folio and memory_failure won't need explicit
> locking when freeing the raw_hwp_list. New code that manipulates
> raw_hwp_list must be careful to ensure the concurrency correctness.
>
> Acked-by: Mike Kravetz <mike.kravetz@oracle.com>
> Acked-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
> Signed-off-by: Jiaqi Yan <jiaqiyan@google.com>
Anyway, this patch looks good to me.
Reviewed-by: Miaohe Lin <linmiaohe@huawei.com>
Thanks.
^ permalink raw reply [flat|nested] 17+ messages in thread
* [PATCH v3 2/4] mm/hwpoison: check if a subpage of a hugetlb folio is raw HWPOISON
2023-07-07 20:19 [PATCH v3 0/4] Improve hugetlbfs read on HWPOISON hugepages Jiaqi Yan
2023-07-07 20:19 ` [PATCH v3 1/4] mm/hwpoison: delete all entries before traversal in __folio_free_raw_hwp Jiaqi Yan
@ 2023-07-07 20:19 ` Jiaqi Yan
2023-07-07 20:31 ` Matthew Wilcox
2023-07-08 2:57 ` Miaohe Lin
2023-07-07 20:19 ` [PATCH v3 3/4] hugetlbfs: improve read HWPOISON hugepage Jiaqi Yan
2023-07-07 20:19 ` [PATCH v3 4/4] selftests/mm: add tests for HWPOISON hugetlbfs read Jiaqi Yan
3 siblings, 2 replies; 17+ messages in thread
From: Jiaqi Yan @ 2023-07-07 20:19 UTC (permalink / raw)
To: akpm, mike.kravetz, naoya.horiguchi
Cc: songmuchun, shy828301, linmiaohe, linux-mm, linux-kernel,
duenwen, axelrasmussen, jthoughton, Jiaqi Yan
Add the functionality, is_raw_hwp_subpage, to tell if a subpage of a
hugetlb folio is a raw HWPOISON page. This functionality relies on
RawHwpUnreliable to be not set; otherwise hugepage's raw HWPOISON list
becomes meaningless.
is_raw_hwp_subpage needs to hold hugetlb_lock in order to synchronize
with __get_huge_page_for_hwpoison, who iterates and inserts an entry to
raw_hwp_list. llist itself doesn't ensure insertion is synchornized with
the iterating used by __is_raw_hwp_list. Caller can minimize the
overhead of lock cycles by first checking if folio / head page's
HWPOISON flag is set.
Exports this functionality to be immediately used in the read operation
for hugetlbfs.
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Reviewed-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
Signed-off-by: Jiaqi Yan <jiaqiyan@google.com>
---
include/linux/hugetlb.h | 19 +++++++++++++++++++
include/linux/mm.h | 7 +++++++
mm/hugetlb.c | 10 ++++++++++
mm/memory-failure.c | 34 ++++++++++++++++++++++++----------
4 files changed, 60 insertions(+), 10 deletions(-)
diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index ca3c8e10f24a..4a745af98525 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -1007,6 +1007,25 @@ void hugetlb_register_node(struct node *node);
void hugetlb_unregister_node(struct node *node);
#endif
+/*
+ * Struct raw_hwp_page represents information about "raw error page",
+ * constructing singly linked list from ->_hugetlb_hwpoison field of folio.
+ */
+struct raw_hwp_page {
+ struct llist_node node;
+ struct page *page;
+};
+
+static inline struct llist_head *raw_hwp_list_head(struct folio *folio)
+{
+ return (struct llist_head *)&folio->_hugetlb_hwpoison;
+}
+
+/*
+ * Check if a given raw @subpage in a hugepage @folio is HWPOISON.
+ */
+bool is_raw_hwp_subpage(struct folio *folio, struct page *subpage);
+
#else /* CONFIG_HUGETLB_PAGE */
struct hstate {};
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 74f1be743ba2..edaa18b6f731 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -3680,6 +3680,7 @@ extern const struct attribute_group memory_failure_attr_group;
extern void memory_failure_queue(unsigned long pfn, int flags);
extern int __get_huge_page_for_hwpoison(unsigned long pfn, int flags,
bool *migratable_cleared);
+extern bool __is_raw_hwp_subpage(struct folio *folio, struct page *subpage);
void num_poisoned_pages_inc(unsigned long pfn);
void num_poisoned_pages_sub(unsigned long pfn, long i);
struct task_struct *task_early_kill(struct task_struct *tsk, int force_early);
@@ -3694,6 +3695,12 @@ static inline int __get_huge_page_for_hwpoison(unsigned long pfn, int flags,
return 0;
}
+static inline bool __is_raw_hwp_subpage(struct folio *folio,
+ struct page *subpage)
+{
+ return false;
+}
+
static inline void num_poisoned_pages_inc(unsigned long pfn)
{
}
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index bce28cca73a1..9c608d2f6630 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -7373,6 +7373,16 @@ int get_huge_page_for_hwpoison(unsigned long pfn, int flags,
return ret;
}
+bool is_raw_hwp_subpage(struct folio *folio, struct page *subpage)
+{
+ bool ret;
+
+ spin_lock_irq(&hugetlb_lock);
+ ret = __is_raw_hwp_subpage(folio, subpage);
+ spin_unlock_irq(&hugetlb_lock);
+ return ret;
+}
+
void folio_putback_active_hugetlb(struct folio *folio)
{
spin_lock_irq(&hugetlb_lock);
diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index a08677dcf953..5b6c8ceb13c0 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -1813,18 +1813,32 @@ EXPORT_SYMBOL_GPL(mf_dax_kill_procs);
#endif /* CONFIG_FS_DAX */
#ifdef CONFIG_HUGETLB_PAGE
-/*
- * Struct raw_hwp_page represents information about "raw error page",
- * constructing singly linked list from ->_hugetlb_hwpoison field of folio.
- */
-struct raw_hwp_page {
- struct llist_node node;
- struct page *page;
-};
-static inline struct llist_head *raw_hwp_list_head(struct folio *folio)
+bool __is_raw_hwp_subpage(struct folio *folio, struct page *subpage)
{
- return (struct llist_head *)&folio->_hugetlb_hwpoison;
+ struct llist_head *raw_hwp_head;
+ struct raw_hwp_page *p, *tmp;
+ bool ret = false;
+
+ if (!folio_test_hwpoison(folio))
+ return false;
+
+ /*
+ * When RawHwpUnreliable is set, kernel lost track of which subpages
+ * are HWPOISON. So return as if ALL subpages are HWPOISONed.
+ */
+ if (folio_test_hugetlb_raw_hwp_unreliable(folio))
+ return true;
+
+ raw_hwp_head = raw_hwp_list_head(folio);
+ llist_for_each_entry_safe(p, tmp, raw_hwp_head->first, node) {
+ if (subpage == p->page) {
+ ret = true;
+ break;
+ }
+ }
+
+ return ret;
}
static unsigned long __folio_free_raw_hwp(struct folio *folio, bool move_flag)
--
2.41.0.255.g8b1d071c50-goog
^ permalink raw reply [flat|nested] 17+ messages in thread* Re: [PATCH v3 2/4] mm/hwpoison: check if a subpage of a hugetlb folio is raw HWPOISON
2023-07-07 20:19 ` [PATCH v3 2/4] mm/hwpoison: check if a subpage of a hugetlb folio is raw HWPOISON Jiaqi Yan
@ 2023-07-07 20:31 ` Matthew Wilcox
2023-07-07 21:05 ` Andrew Morton
2023-07-10 0:21 ` Naoya Horiguchi
2023-07-08 2:57 ` Miaohe Lin
1 sibling, 2 replies; 17+ messages in thread
From: Matthew Wilcox @ 2023-07-07 20:31 UTC (permalink / raw)
To: Jiaqi Yan
Cc: akpm, mike.kravetz, naoya.horiguchi, songmuchun, shy828301,
linmiaohe, linux-mm, linux-kernel, duenwen, axelrasmussen,
jthoughton
On Fri, Jul 07, 2023 at 08:19:02PM +0000, Jiaqi Yan wrote:
> Add the functionality, is_raw_hwp_subpage, to tell if a subpage of a
This is incorrect naming. "subpage" was needed before we had the
folio concept, but it should not be used any more. We have folios
and pages now.
Also, abbreviating "hwpoison" as "hwp" seems like a bad idea to me.
hwp is already used as an acronym by acpi, intel_pstate, some clock
drivers, an ethernet driver, and a scsi driver.
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v3 2/4] mm/hwpoison: check if a subpage of a hugetlb folio is raw HWPOISON
2023-07-07 20:31 ` Matthew Wilcox
@ 2023-07-07 21:05 ` Andrew Morton
2023-07-10 0:21 ` Naoya Horiguchi
1 sibling, 0 replies; 17+ messages in thread
From: Andrew Morton @ 2023-07-07 21:05 UTC (permalink / raw)
To: Matthew Wilcox
Cc: Jiaqi Yan, mike.kravetz, naoya.horiguchi, songmuchun, shy828301,
linmiaohe, linux-mm, linux-kernel, duenwen, axelrasmussen,
jthoughton
On Fri, 7 Jul 2023 21:31:39 +0100 Matthew Wilcox <willy@infradead.org> wrote:
> On Fri, Jul 07, 2023 at 08:19:02PM +0000, Jiaqi Yan wrote:
> > Add the functionality, is_raw_hwp_subpage, to tell if a subpage of a
>
> This is incorrect naming. "subpage" was needed before we had the
> folio concept, but it should not be used any more. We have folios
> and pages now.
>
> Also, abbreviating "hwpoison" as "hwp" seems like a bad idea to me.
> hwp is already used as an acronym by acpi, intel_pstate, some clock
> drivers, an ethernet driver, and a scsi driver.
Thanks, Matthew. I'll merge v3 so we hit next -next (which actually
won't be until Monday), and add a note that v4 is expected.
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v3 2/4] mm/hwpoison: check if a subpage of a hugetlb folio is raw HWPOISON
2023-07-07 20:31 ` Matthew Wilcox
2023-07-07 21:05 ` Andrew Morton
@ 2023-07-10 0:21 ` Naoya Horiguchi
2023-07-10 15:11 ` Jiaqi Yan
1 sibling, 1 reply; 17+ messages in thread
From: Naoya Horiguchi @ 2023-07-10 0:21 UTC (permalink / raw)
To: Matthew Wilcox
Cc: Jiaqi Yan, akpm, mike.kravetz, naoya.horiguchi, songmuchun,
shy828301, linmiaohe, linux-mm, linux-kernel, duenwen,
axelrasmussen, jthoughton
On Fri, Jul 07, 2023 at 09:31:39PM +0100, Matthew Wilcox wrote:
> On Fri, Jul 07, 2023 at 08:19:02PM +0000, Jiaqi Yan wrote:
> > Add the functionality, is_raw_hwp_subpage, to tell if a subpage of a
>
> This is incorrect naming. "subpage" was needed before we had the
> folio concept, but it should not be used any more. We have folios
> and pages now.
I think we can address the raw hwpoison page by the offset in folio/hugepage
to eliminate the concept of "subpage".
>
> Also, abbreviating "hwpoison" as "hwp" seems like a bad idea to me.
> hwp is already used as an acronym by acpi, intel_pstate, some clock
> drivers, an ethernet driver, and a scsi driver.
I originally introduced the abbreviation "hwp" to avoid using a lengthy
function name such as "folio_test_hugetlb_raw_hwpoison_unreliable()."
Therefore, I prefer using "rawhwp" instead of a longer form like
"raw_hwpoison," although I don't expect any confusion between "hwp" and
"rawhwp."
As for "hwp_walk", another user of "hwp" in in mm/memory-failure.c,
we can easily substitute it with "hwpoison_walk."
Thanks,
Naoya Horiguchi
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v3 2/4] mm/hwpoison: check if a subpage of a hugetlb folio is raw HWPOISON
2023-07-10 0:21 ` Naoya Horiguchi
@ 2023-07-10 15:11 ` Jiaqi Yan
2023-07-11 8:59 ` Naoya Horiguchi
0 siblings, 1 reply; 17+ messages in thread
From: Jiaqi Yan @ 2023-07-10 15:11 UTC (permalink / raw)
To: Naoya Horiguchi, Matthew Wilcox
Cc: akpm, mike.kravetz, naoya.horiguchi, songmuchun, shy828301,
linmiaohe, linux-mm, linux-kernel, duenwen, axelrasmussen,
jthoughton
On Sun, Jul 9, 2023 at 5:21 PM Naoya Horiguchi
<naoya.horiguchi@linux.dev> wrote:
>
> On Fri, Jul 07, 2023 at 09:31:39PM +0100, Matthew Wilcox wrote:
> > On Fri, Jul 07, 2023 at 08:19:02PM +0000, Jiaqi Yan wrote:
> > > Add the functionality, is_raw_hwp_subpage, to tell if a subpage of a
> >
> > This is incorrect naming. "subpage" was needed before we had the
> > folio concept, but it should not be used any more. We have folios
> > and pages now.
>
Thanks for your comment, Matthew.
> I think we can address the raw hwpoison page by the offset in folio/hugepage
> to eliminate the concept of "subpage".
>
> >
> > Also, abbreviating "hwpoison" as "hwp" seems like a bad idea to me.
> > hwp is already used as an acronym by acpi, intel_pstate, some clock
> > drivers, an ethernet driver, and a scsi driver.
>
> I originally introduced the abbreviation "hwp" to avoid using a lengthy
> function name such as "folio_test_hugetlb_raw_hwpoison_unreliable()."
> Therefore, I prefer using "rawhwp" instead of a longer form like
> "raw_hwpoison," although I don't expect any confusion between "hwp" and
> "rawhwp."
These are names in my mind, what do you think?
* is_rawhwp_page_in_hugepage
* is_raw_hwpoison_page_in_hugepage // I prefer this one
* folio_test_hugetlb_raw_hwpoison_page
> As for "hwp_walk", another user of "hwp" in in mm/memory-failure.c,
> we can easily substitute it with "hwpoison_walk."
In this "hwp_walk" case, I also prefer "hwpoison" than "hwp". I can
create a separate renaming patch.
>
> Thanks,
> Naoya Horiguchi
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v3 2/4] mm/hwpoison: check if a subpage of a hugetlb folio is raw HWPOISON
2023-07-10 15:11 ` Jiaqi Yan
@ 2023-07-11 8:59 ` Naoya Horiguchi
0 siblings, 0 replies; 17+ messages in thread
From: Naoya Horiguchi @ 2023-07-11 8:59 UTC (permalink / raw)
To: Jiaqi Yan
Cc: Matthew Wilcox, akpm, mike.kravetz, naoya.horiguchi, songmuchun,
shy828301, linmiaohe, linux-mm, linux-kernel, duenwen,
axelrasmussen, jthoughton
On Mon, Jul 10, 2023 at 08:11:48AM -0700, Jiaqi Yan wrote:
> On Sun, Jul 9, 2023 at 5:21 PM Naoya Horiguchi
> <naoya.horiguchi@linux.dev> wrote:
> >
> > On Fri, Jul 07, 2023 at 09:31:39PM +0100, Matthew Wilcox wrote:
> > > On Fri, Jul 07, 2023 at 08:19:02PM +0000, Jiaqi Yan wrote:
> > > > Add the functionality, is_raw_hwp_subpage, to tell if a subpage of a
> > >
> > > This is incorrect naming. "subpage" was needed before we had the
> > > folio concept, but it should not be used any more. We have folios
> > > and pages now.
> >
>
> Thanks for your comment, Matthew.
>
> > I think we can address the raw hwpoison page by the offset in folio/hugepage
> > to eliminate the concept of "subpage".
> >
> > >
> > > Also, abbreviating "hwpoison" as "hwp" seems like a bad idea to me.
> > > hwp is already used as an acronym by acpi, intel_pstate, some clock
> > > drivers, an ethernet driver, and a scsi driver.
> >
> > I originally introduced the abbreviation "hwp" to avoid using a lengthy
> > function name such as "folio_test_hugetlb_raw_hwpoison_unreliable()."
> > Therefore, I prefer using "rawhwp" instead of a longer form like
> > "raw_hwpoison," although I don't expect any confusion between "hwp" and
> > "rawhwp."
>
> These are names in my mind, what do you think?
> * is_rawhwp_page_in_hugepage
> * is_raw_hwpoison_page_in_hugepage // I prefer this one
This one is fine to me.
> * folio_test_hugetlb_raw_hwpoison_page
>
> > As for "hwp_walk", another user of "hwp" in in mm/memory-failure.c,
> > we can easily substitute it with "hwpoison_walk."
>
> In this "hwp_walk" case, I also prefer "hwpoison" than "hwp". I can
> create a separate renaming patch.
Great, thank you.
- Naoya Horiguchi
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v3 2/4] mm/hwpoison: check if a subpage of a hugetlb folio is raw HWPOISON
2023-07-07 20:19 ` [PATCH v3 2/4] mm/hwpoison: check if a subpage of a hugetlb folio is raw HWPOISON Jiaqi Yan
2023-07-07 20:31 ` Matthew Wilcox
@ 2023-07-08 2:57 ` Miaohe Lin
2023-07-10 15:16 ` Jiaqi Yan
1 sibling, 1 reply; 17+ messages in thread
From: Miaohe Lin @ 2023-07-08 2:57 UTC (permalink / raw)
To: Jiaqi Yan, akpm, mike.kravetz, naoya.horiguchi
Cc: songmuchun, shy828301, linux-mm, linux-kernel, duenwen,
axelrasmussen, jthoughton
On 2023/7/8 4:19, Jiaqi Yan wrote:
> Add the functionality, is_raw_hwp_subpage, to tell if a subpage of a
> hugetlb folio is a raw HWPOISON page. This functionality relies on
> RawHwpUnreliable to be not set; otherwise hugepage's raw HWPOISON list
> becomes meaningless.
>
> is_raw_hwp_subpage needs to hold hugetlb_lock in order to synchronize
> with __get_huge_page_for_hwpoison, who iterates and inserts an entry to
> raw_hwp_list. llist itself doesn't ensure insertion is synchornized with
> the iterating used by __is_raw_hwp_list. Caller can minimize the
> overhead of lock cycles by first checking if folio / head page's
> HWPOISON flag is set.
>
> Exports this functionality to be immediately used in the read operation
> for hugetlbfs.
>
> Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
> Reviewed-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
> Signed-off-by: Jiaqi Yan <jiaqiyan@google.com>
> ---
> include/linux/hugetlb.h | 19 +++++++++++++++++++
> include/linux/mm.h | 7 +++++++
> mm/hugetlb.c | 10 ++++++++++
> mm/memory-failure.c | 34 ++++++++++++++++++++++++----------
> 4 files changed, 60 insertions(+), 10 deletions(-)
> ...
> -static inline struct llist_head *raw_hwp_list_head(struct folio *folio)
> +bool __is_raw_hwp_subpage(struct folio *folio, struct page *subpage)
> {
> - return (struct llist_head *)&folio->_hugetlb_hwpoison;
> + struct llist_head *raw_hwp_head;
> + struct raw_hwp_page *p, *tmp;
> + bool ret = false;
> +
> + if (!folio_test_hwpoison(folio))
> + return false;
> +
> + /*
> + * When RawHwpUnreliable is set, kernel lost track of which subpages
> + * are HWPOISON. So return as if ALL subpages are HWPOISONed.
> + */
> + if (folio_test_hugetlb_raw_hwp_unreliable(folio))
> + return true;
> +
> + raw_hwp_head = raw_hwp_list_head(folio);
> + llist_for_each_entry_safe(p, tmp, raw_hwp_head->first, node) {
Since we don't free the raw_hwp_list, does llist_for_each_entry works same as llist_for_each_entry_safe?
> + if (subpage == p->page) {
> + ret = true;
> + break;
> + }
> + }
> +
> + return ret;
> }
It seems there's a race between __is_raw_hwp_subpage and unpoison_memory:
unpoison_memory __is_raw_hwp_subpage
if (!folio_test_hwpoison(folio)) -- hwpoison is set
folio_free_raw_hwp llist_for_each_entry_safe raw_hwp_list
llist_del_all ..
folio_test_clear_hwpoison
But __is_raw_hwp_subpage is used in hugetlbfs, unpoison_memory couldn't reach here because there's a
folio_mapping == NULL check before folio_free_raw_hwp.
Anyway, this patch looks good to me.
Reviewed-by: Miaohe Lin <linmiaohe@huawei.com>
Thanks.
^ permalink raw reply [flat|nested] 17+ messages in thread* Re: [PATCH v3 2/4] mm/hwpoison: check if a subpage of a hugetlb folio is raw HWPOISON
2023-07-08 2:57 ` Miaohe Lin
@ 2023-07-10 15:16 ` Jiaqi Yan
2023-07-11 17:05 ` Jiaqi Yan
0 siblings, 1 reply; 17+ messages in thread
From: Jiaqi Yan @ 2023-07-10 15:16 UTC (permalink / raw)
To: Miaohe Lin
Cc: akpm, mike.kravetz, naoya.horiguchi, songmuchun, shy828301,
linux-mm, linux-kernel, duenwen, axelrasmussen, jthoughton
On Fri, Jul 7, 2023 at 7:57 PM Miaohe Lin <linmiaohe@huawei.com> wrote:
>
> On 2023/7/8 4:19, Jiaqi Yan wrote:
> > Add the functionality, is_raw_hwp_subpage, to tell if a subpage of a
> > hugetlb folio is a raw HWPOISON page. This functionality relies on
> > RawHwpUnreliable to be not set; otherwise hugepage's raw HWPOISON list
> > becomes meaningless.
> >
> > is_raw_hwp_subpage needs to hold hugetlb_lock in order to synchronize
> > with __get_huge_page_for_hwpoison, who iterates and inserts an entry to
> > raw_hwp_list. llist itself doesn't ensure insertion is synchornized with
> > the iterating used by __is_raw_hwp_list. Caller can minimize the
> > overhead of lock cycles by first checking if folio / head page's
> > HWPOISON flag is set.
> >
> > Exports this functionality to be immediately used in the read operation
> > for hugetlbfs.
> >
> > Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
> > Reviewed-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
> > Signed-off-by: Jiaqi Yan <jiaqiyan@google.com>
> > ---
> > include/linux/hugetlb.h | 19 +++++++++++++++++++
> > include/linux/mm.h | 7 +++++++
> > mm/hugetlb.c | 10 ++++++++++
> > mm/memory-failure.c | 34 ++++++++++++++++++++++++----------
> > 4 files changed, 60 insertions(+), 10 deletions(-)
> > ...
> > -static inline struct llist_head *raw_hwp_list_head(struct folio *folio)
> > +bool __is_raw_hwp_subpage(struct folio *folio, struct page *subpage)
> > {
> > - return (struct llist_head *)&folio->_hugetlb_hwpoison;
> > + struct llist_head *raw_hwp_head;
> > + struct raw_hwp_page *p, *tmp;
> > + bool ret = false;
> > +
> > + if (!folio_test_hwpoison(folio))
> > + return false;
> > +
> > + /*
> > + * When RawHwpUnreliable is set, kernel lost track of which subpages
> > + * are HWPOISON. So return as if ALL subpages are HWPOISONed.
> > + */
> > + if (folio_test_hugetlb_raw_hwp_unreliable(folio))
> > + return true;
> > +
> > + raw_hwp_head = raw_hwp_list_head(folio);
> > + llist_for_each_entry_safe(p, tmp, raw_hwp_head->first, node) {
>
> Since we don't free the raw_hwp_list, does llist_for_each_entry works same as llist_for_each_entry_safe?
>
> > + if (subpage == p->page) {
> > + ret = true;
> > + break;
> > + }
> > + }
> > +
> > + return ret;
> > }
>
> It seems there's a race between __is_raw_hwp_subpage and unpoison_memory:
> unpoison_memory __is_raw_hwp_subpage
> if (!folio_test_hwpoison(folio)) -- hwpoison is set
> folio_free_raw_hwp llist_for_each_entry_safe raw_hwp_list
> llist_del_all ..
> folio_test_clear_hwpoison
>
Thanks Miaohe for raising this concern.
> But __is_raw_hwp_subpage is used in hugetlbfs, unpoison_memory couldn't reach here because there's a
> folio_mapping == NULL check before folio_free_raw_hwp.
I agree. But in near future I do want to make __is_raw_hwp_subpage
work for shared-mapping hugetlb, so it would be nice to work with
unpoison_memory. It doesn't seem to me that holding mf_mutex in
__is_raw_hwp_subpage is nice or even absolutely correct. Let me think
if I can come up with something in v4.
>
> Anyway, this patch looks good to me.
>
> Reviewed-by: Miaohe Lin <linmiaohe@huawei.com>
> Thanks.
>
^ permalink raw reply [flat|nested] 17+ messages in thread* Re: [PATCH v3 2/4] mm/hwpoison: check if a subpage of a hugetlb folio is raw HWPOISON
2023-07-10 15:16 ` Jiaqi Yan
@ 2023-07-11 17:05 ` Jiaqi Yan
2023-07-11 18:01 ` Mike Kravetz
0 siblings, 1 reply; 17+ messages in thread
From: Jiaqi Yan @ 2023-07-11 17:05 UTC (permalink / raw)
To: Miaohe Lin
Cc: akpm, mike.kravetz, naoya.horiguchi, songmuchun, shy828301,
linux-mm, linux-kernel, duenwen, axelrasmussen, jthoughton
On Mon, Jul 10, 2023 at 8:16 AM Jiaqi Yan <jiaqiyan@google.com> wrote:
>
> On Fri, Jul 7, 2023 at 7:57 PM Miaohe Lin <linmiaohe@huawei.com> wrote:
> >
> > On 2023/7/8 4:19, Jiaqi Yan wrote:
> > > Add the functionality, is_raw_hwp_subpage, to tell if a subpage of a
> > > hugetlb folio is a raw HWPOISON page. This functionality relies on
> > > RawHwpUnreliable to be not set; otherwise hugepage's raw HWPOISON list
> > > becomes meaningless.
> > >
> > > is_raw_hwp_subpage needs to hold hugetlb_lock in order to synchronize
> > > with __get_huge_page_for_hwpoison, who iterates and inserts an entry to
> > > raw_hwp_list. llist itself doesn't ensure insertion is synchornized with
> > > the iterating used by __is_raw_hwp_list. Caller can minimize the
> > > overhead of lock cycles by first checking if folio / head page's
> > > HWPOISON flag is set.
> > >
> > > Exports this functionality to be immediately used in the read operation
> > > for hugetlbfs.
> > >
> > > Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
> > > Reviewed-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
> > > Signed-off-by: Jiaqi Yan <jiaqiyan@google.com>
> > > ---
> > > include/linux/hugetlb.h | 19 +++++++++++++++++++
> > > include/linux/mm.h | 7 +++++++
> > > mm/hugetlb.c | 10 ++++++++++
> > > mm/memory-failure.c | 34 ++++++++++++++++++++++++----------
> > > 4 files changed, 60 insertions(+), 10 deletions(-)
> > > ...
> > > -static inline struct llist_head *raw_hwp_list_head(struct folio *folio)
> > > +bool __is_raw_hwp_subpage(struct folio *folio, struct page *subpage)
> > > {
> > > - return (struct llist_head *)&folio->_hugetlb_hwpoison;
> > > + struct llist_head *raw_hwp_head;
> > > + struct raw_hwp_page *p, *tmp;
> > > + bool ret = false;
> > > +
> > > + if (!folio_test_hwpoison(folio))
> > > + return false;
> > > +
> > > + /*
> > > + * When RawHwpUnreliable is set, kernel lost track of which subpages
> > > + * are HWPOISON. So return as if ALL subpages are HWPOISONed.
> > > + */
> > > + if (folio_test_hugetlb_raw_hwp_unreliable(folio))
> > > + return true;
> > > +
> > > + raw_hwp_head = raw_hwp_list_head(folio);
> > > + llist_for_each_entry_safe(p, tmp, raw_hwp_head->first, node) {
> >
> > Since we don't free the raw_hwp_list, does llist_for_each_entry works same as llist_for_each_entry_safe?
Sorry I missed this comment. Yes they are the same but
llist_for_each_entry doesn't need "tmp". I will switch to
llist_for_each_entry in v4.
>
> >
> > > + if (subpage == p->page) {
> > > + ret = true;
> > > + break;
> > > + }
> > > + }
> > > +
> > > + return ret;
> > > }
> >
> > It seems there's a race between __is_raw_hwp_subpage and unpoison_memory:
> > unpoison_memory __is_raw_hwp_subpage
> > if (!folio_test_hwpoison(folio)) -- hwpoison is set
> > folio_free_raw_hwp llist_for_each_entry_safe raw_hwp_list
> > llist_del_all ..
> > folio_test_clear_hwpoison
> >
>
> Thanks Miaohe for raising this concern.
>
> > But __is_raw_hwp_subpage is used in hugetlbfs, unpoison_memory couldn't reach here because there's a
> > folio_mapping == NULL check before folio_free_raw_hwp.
>
> I agree. But in near future I do want to make __is_raw_hwp_subpage
> work for shared-mapping hugetlb, so it would be nice to work with
> unpoison_memory. It doesn't seem to me that holding mf_mutex in
> __is_raw_hwp_subpage is nice or even absolutely correct. Let me think
> if I can come up with something in v4.
At my 2nd thought, if __is_raw_hwp_subpage simply takes mf_mutex
before llist_for_each_entry, it will introduce a deadlock:
unpoison_memory __is_raw_hwp_subpage
held mf_mutex held hugetlb_lock
get_hwpoison_hugetlb_folio attempts mf_mutex
attempts hugetlb lock
Not for this patch series, but for future, is it a good idea to make
mf_mutex available to hugetlb code? Then enforce the order of locking
to be mf_mutex first, hugetlb_lock second? I believe this is the
current locking pattern / order for try_memory_failure_hugetlb.
>
> >
> > Anyway, this patch looks good to me.
> >
> > Reviewed-by: Miaohe Lin <linmiaohe@huawei.com>
> > Thanks.
> >
^ permalink raw reply [flat|nested] 17+ messages in thread* Re: [PATCH v3 2/4] mm/hwpoison: check if a subpage of a hugetlb folio is raw HWPOISON
2023-07-11 17:05 ` Jiaqi Yan
@ 2023-07-11 18:01 ` Mike Kravetz
2023-07-11 22:22 ` Jiaqi Yan
2023-07-12 2:25 ` Miaohe Lin
0 siblings, 2 replies; 17+ messages in thread
From: Mike Kravetz @ 2023-07-11 18:01 UTC (permalink / raw)
To: Jiaqi Yan
Cc: Miaohe Lin, akpm, naoya.horiguchi, songmuchun, shy828301,
linux-mm, linux-kernel, duenwen, axelrasmussen, jthoughton
On 07/11/23 10:05, Jiaqi Yan wrote:
> On Mon, Jul 10, 2023 at 8:16 AM Jiaqi Yan <jiaqiyan@google.com> wrote:
> > On Fri, Jul 7, 2023 at 7:57 PM Miaohe Lin <linmiaohe@huawei.com> wrote:
> > > On 2023/7/8 4:19, Jiaqi Yan wrote:
> > >
> > > > + if (subpage == p->page) {
> > > > + ret = true;
> > > > + break;
> > > > + }
> > > > + }
> > > > +
> > > > + return ret;
> > > > }
> > >
> > > It seems there's a race between __is_raw_hwp_subpage and unpoison_memory:
> > > unpoison_memory __is_raw_hwp_subpage
> > > if (!folio_test_hwpoison(folio)) -- hwpoison is set
> > > folio_free_raw_hwp llist_for_each_entry_safe raw_hwp_list
> > > llist_del_all ..
> > > folio_test_clear_hwpoison
> > >
> >
> > Thanks Miaohe for raising this concern.
> >
> > > But __is_raw_hwp_subpage is used in hugetlbfs, unpoison_memory couldn't reach here because there's a
> > > folio_mapping == NULL check before folio_free_raw_hwp.
> >
> > I agree. But in near future I do want to make __is_raw_hwp_subpage
> > work for shared-mapping hugetlb, so it would be nice to work with
> > unpoison_memory. It doesn't seem to me that holding mf_mutex in
> > __is_raw_hwp_subpage is nice or even absolutely correct. Let me think
> > if I can come up with something in v4.
>
> At my 2nd thought, if __is_raw_hwp_subpage simply takes mf_mutex
> before llist_for_each_entry, it will introduce a deadlock:
>
> unpoison_memory __is_raw_hwp_subpage
> held mf_mutex held hugetlb_lock
> get_hwpoison_hugetlb_folio attempts mf_mutex
> attempts hugetlb lock
>
> Not for this patch series, but for future, is it a good idea to make
> mf_mutex available to hugetlb code? Then enforce the order of locking
> to be mf_mutex first, hugetlb_lock second? I believe this is the
> current locking pattern / order for try_memory_failure_hugetlb.
I think only holding mf_mutex in __is_raw_hwp_subpage would be sufficient
to prevent races with unpoison_memory. memory failure code needs to take
both mf_mutex and hugetlb_lock. The hugetlb lock is to prevent hugetlb
page state changes. IIUC, __is_raw_hwp_subpage is only taking hugetlb_lock
to prevent races with memory failure code.
Of course, I could be missing something as there are subtle issues with
locking in the memory failure code.
--
Mike Kravetz
^ permalink raw reply [flat|nested] 17+ messages in thread* Re: [PATCH v3 2/4] mm/hwpoison: check if a subpage of a hugetlb folio is raw HWPOISON
2023-07-11 18:01 ` Mike Kravetz
@ 2023-07-11 22:22 ` Jiaqi Yan
2023-07-12 2:25 ` Miaohe Lin
1 sibling, 0 replies; 17+ messages in thread
From: Jiaqi Yan @ 2023-07-11 22:22 UTC (permalink / raw)
To: Mike Kravetz
Cc: Miaohe Lin, akpm, naoya.horiguchi, songmuchun, shy828301,
linux-mm, linux-kernel, duenwen, axelrasmussen, jthoughton
On Tue, Jul 11, 2023 at 11:02 AM Mike Kravetz <mike.kravetz@oracle.com> wrote:
>
> On 07/11/23 10:05, Jiaqi Yan wrote:
> > On Mon, Jul 10, 2023 at 8:16 AM Jiaqi Yan <jiaqiyan@google.com> wrote:
> > > On Fri, Jul 7, 2023 at 7:57 PM Miaohe Lin <linmiaohe@huawei.com> wrote:
> > > > On 2023/7/8 4:19, Jiaqi Yan wrote:
> > > >
> > > > > + if (subpage == p->page) {
> > > > > + ret = true;
> > > > > + break;
> > > > > + }
> > > > > + }
> > > > > +
> > > > > + return ret;
> > > > > }
> > > >
> > > > It seems there's a race between __is_raw_hwp_subpage and unpoison_memory:
> > > > unpoison_memory __is_raw_hwp_subpage
> > > > if (!folio_test_hwpoison(folio)) -- hwpoison is set
> > > > folio_free_raw_hwp llist_for_each_entry_safe raw_hwp_list
> > > > llist_del_all ..
> > > > folio_test_clear_hwpoison
> > > >
> > >
> > > Thanks Miaohe for raising this concern.
> > >
> > > > But __is_raw_hwp_subpage is used in hugetlbfs, unpoison_memory couldn't reach here because there's a
> > > > folio_mapping == NULL check before folio_free_raw_hwp.
> > >
> > > I agree. But in near future I do want to make __is_raw_hwp_subpage
> > > work for shared-mapping hugetlb, so it would be nice to work with
> > > unpoison_memory. It doesn't seem to me that holding mf_mutex in
> > > __is_raw_hwp_subpage is nice or even absolutely correct. Let me think
> > > if I can come up with something in v4.
> >
> > At my 2nd thought, if __is_raw_hwp_subpage simply takes mf_mutex
> > before llist_for_each_entry, it will introduce a deadlock:
> >
> > unpoison_memory __is_raw_hwp_subpage
> > held mf_mutex held hugetlb_lock
> > get_hwpoison_hugetlb_folio attempts mf_mutex
> > attempts hugetlb lock
> >
> > Not for this patch series, but for future, is it a good idea to make
> > mf_mutex available to hugetlb code? Then enforce the order of locking
> > to be mf_mutex first, hugetlb_lock second? I believe this is the
> > current locking pattern / order for try_memory_failure_hugetlb.
>
> I think only holding mf_mutex in __is_raw_hwp_subpage would be sufficient
> to prevent races with unpoison_memory. memory failure code needs to take
> both mf_mutex and hugetlb_lock. The hugetlb lock is to prevent hugetlb
> page state changes. IIUC, __is_raw_hwp_subpage is only taking hugetlb_lock
> to prevent races with memory failure code.
Thanks Mike, I think mf_mutex is a simple and correct solution. I will
drop hugetlb_lock from __is_raw_hwp_subpage in v4.
>
> Of course, I could be missing something as there are subtle issues with
> locking in the memory failure code.
> --
> Mike Kravetz
^ permalink raw reply [flat|nested] 17+ messages in thread* Re: [PATCH v3 2/4] mm/hwpoison: check if a subpage of a hugetlb folio is raw HWPOISON
2023-07-11 18:01 ` Mike Kravetz
2023-07-11 22:22 ` Jiaqi Yan
@ 2023-07-12 2:25 ` Miaohe Lin
1 sibling, 0 replies; 17+ messages in thread
From: Miaohe Lin @ 2023-07-12 2:25 UTC (permalink / raw)
To: Mike Kravetz, Jiaqi Yan
Cc: akpm, naoya.horiguchi, songmuchun, shy828301, linux-mm,
linux-kernel, duenwen, axelrasmussen, jthoughton
On 2023/7/12 2:01, Mike Kravetz wrote:
> On 07/11/23 10:05, Jiaqi Yan wrote:
>> On Mon, Jul 10, 2023 at 8:16 AM Jiaqi Yan <jiaqiyan@google.com> wrote:
>>> On Fri, Jul 7, 2023 at 7:57 PM Miaohe Lin <linmiaohe@huawei.com> wrote:
>>>> On 2023/7/8 4:19, Jiaqi Yan wrote:
>>>>
>>>>> + if (subpage == p->page) {
>>>>> + ret = true;
>>>>> + break;
>>>>> + }
>>>>> + }
>>>>> +
>>>>> + return ret;
>>>>> }
>>>>
>>>> It seems there's a race between __is_raw_hwp_subpage and unpoison_memory:
>>>> unpoison_memory __is_raw_hwp_subpage
>>>> if (!folio_test_hwpoison(folio)) -- hwpoison is set
>>>> folio_free_raw_hwp llist_for_each_entry_safe raw_hwp_list
>>>> llist_del_all ..
>>>> folio_test_clear_hwpoison
>>>>
>>>
>>> Thanks Miaohe for raising this concern.
>>>
>>>> But __is_raw_hwp_subpage is used in hugetlbfs, unpoison_memory couldn't reach here because there's a
>>>> folio_mapping == NULL check before folio_free_raw_hwp.
>>>
>>> I agree. But in near future I do want to make __is_raw_hwp_subpage
>>> work for shared-mapping hugetlb, so it would be nice to work with
>>> unpoison_memory. It doesn't seem to me that holding mf_mutex in
>>> __is_raw_hwp_subpage is nice or even absolutely correct. Let me think
>>> if I can come up with something in v4.
>>
>> At my 2nd thought, if __is_raw_hwp_subpage simply takes mf_mutex
>> before llist_for_each_entry, it will introduce a deadlock:
>>
>> unpoison_memory __is_raw_hwp_subpage
>> held mf_mutex held hugetlb_lock
>> get_hwpoison_hugetlb_folio attempts mf_mutex
>> attempts hugetlb lock
>>
>> Not for this patch series, but for future, is it a good idea to make
>> mf_mutex available to hugetlb code? Then enforce the order of locking
>> to be mf_mutex first, hugetlb_lock second? I believe this is the
>> current locking pattern / order for try_memory_failure_hugetlb.
>
> I think only holding mf_mutex in __is_raw_hwp_subpage would be sufficient
> to prevent races with unpoison_memory. memory failure code needs to take
Since soft_offline_page, memory_failure and unpoison_memory both holds mf_mutex,
I think this should be enough to prevent races between them too.
Thanks.
^ permalink raw reply [flat|nested] 17+ messages in thread
* [PATCH v3 3/4] hugetlbfs: improve read HWPOISON hugepage
2023-07-07 20:19 [PATCH v3 0/4] Improve hugetlbfs read on HWPOISON hugepages Jiaqi Yan
2023-07-07 20:19 ` [PATCH v3 1/4] mm/hwpoison: delete all entries before traversal in __folio_free_raw_hwp Jiaqi Yan
2023-07-07 20:19 ` [PATCH v3 2/4] mm/hwpoison: check if a subpage of a hugetlb folio is raw HWPOISON Jiaqi Yan
@ 2023-07-07 20:19 ` Jiaqi Yan
2023-07-07 20:19 ` [PATCH v3 4/4] selftests/mm: add tests for HWPOISON hugetlbfs read Jiaqi Yan
3 siblings, 0 replies; 17+ messages in thread
From: Jiaqi Yan @ 2023-07-07 20:19 UTC (permalink / raw)
To: akpm, mike.kravetz, naoya.horiguchi
Cc: songmuchun, shy828301, linmiaohe, linux-mm, linux-kernel,
duenwen, axelrasmussen, jthoughton, Jiaqi Yan
When a hugepage contains HWPOISON pages, read() fails to read any byte
of the hugepage and returns -EIO, although many bytes in the HWPOISON
hugepage are readable.
Improve this by allowing hugetlbfs_read_iter returns as many bytes as
possible. For a requested range [offset, offset + len) that contains
HWPOISON page, return [offset, first HWPOISON page addr); the next read
attempt will fail and return -EIO.
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Reviewed-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
Signed-off-by: Jiaqi Yan <jiaqiyan@google.com>
---
fs/hugetlbfs/inode.c | 58 +++++++++++++++++++++++++++++++++++++++-----
1 file changed, 52 insertions(+), 6 deletions(-)
diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
index 7b17ccfa039d..c2b807d37f85 100644
--- a/fs/hugetlbfs/inode.c
+++ b/fs/hugetlbfs/inode.c
@@ -282,6 +282,42 @@ hugetlb_get_unmapped_area(struct file *file, unsigned long addr,
}
#endif
+/*
+ * Someone wants to read @bytes from a HWPOISON hugetlb @page from @offset.
+ * Returns the maximum number of bytes one can read without touching the 1st raw
+ * HWPOISON subpage.
+ *
+ * The implementation borrows the iteration logic from copy_page_to_iter*.
+ */
+static size_t adjust_range_hwpoison(struct page *page, size_t offset, size_t bytes)
+{
+ size_t n = 0;
+ size_t res = 0;
+ struct folio *folio = page_folio(page);
+
+ /* First subpage to start the loop. */
+ page += offset / PAGE_SIZE;
+ offset %= PAGE_SIZE;
+ while (1) {
+ if (is_raw_hwp_subpage(folio, page))
+ break;
+
+ /* Safe to read n bytes without touching HWPOISON subpage. */
+ n = min(bytes, (size_t)PAGE_SIZE - offset);
+ res += n;
+ bytes -= n;
+ if (!bytes || !n)
+ break;
+ offset += n;
+ if (offset == PAGE_SIZE) {
+ page++;
+ offset = 0;
+ }
+ }
+
+ return res;
+}
+
/*
* Support for read() - Find the page attached to f_mapping and copy out the
* data. This provides functionality similar to filemap_read().
@@ -300,7 +336,7 @@ static ssize_t hugetlbfs_read_iter(struct kiocb *iocb, struct iov_iter *to)
while (iov_iter_count(to)) {
struct page *page;
- size_t nr, copied;
+ size_t nr, copied, want;
/* nr is the maximum number of bytes to copy from this page */
nr = huge_page_size(h);
@@ -328,16 +364,26 @@ static ssize_t hugetlbfs_read_iter(struct kiocb *iocb, struct iov_iter *to)
} else {
unlock_page(page);
- if (PageHWPoison(page)) {
- put_page(page);
- retval = -EIO;
- break;
+ if (!PageHWPoison(page))
+ want = nr;
+ else {
+ /*
+ * Adjust how many bytes safe to read without
+ * touching the 1st raw HWPOISON subpage after
+ * offset.
+ */
+ want = adjust_range_hwpoison(page, offset, nr);
+ if (want == 0) {
+ put_page(page);
+ retval = -EIO;
+ break;
+ }
}
/*
* We have the page, copy it to user space buffer.
*/
- copied = copy_page_to_iter(page, offset, nr, to);
+ copied = copy_page_to_iter(page, offset, want, to);
put_page(page);
}
offset += copied;
--
2.41.0.255.g8b1d071c50-goog
^ permalink raw reply [flat|nested] 17+ messages in thread* [PATCH v3 4/4] selftests/mm: add tests for HWPOISON hugetlbfs read
2023-07-07 20:19 [PATCH v3 0/4] Improve hugetlbfs read on HWPOISON hugepages Jiaqi Yan
` (2 preceding siblings ...)
2023-07-07 20:19 ` [PATCH v3 3/4] hugetlbfs: improve read HWPOISON hugepage Jiaqi Yan
@ 2023-07-07 20:19 ` Jiaqi Yan
3 siblings, 0 replies; 17+ messages in thread
From: Jiaqi Yan @ 2023-07-07 20:19 UTC (permalink / raw)
To: akpm, mike.kravetz, naoya.horiguchi
Cc: songmuchun, shy828301, linmiaohe, linux-mm, linux-kernel,
duenwen, axelrasmussen, jthoughton, Jiaqi Yan
Add tests for the improvement made to read operation on HWPOISON
hugetlb page with different read granularities. For each chunk size,
three read scenarios are tested:
1. Simple regression test on read without HWPOISON.
2. Sequential read page by page should succeed until encounters the 1st
raw HWPOISON subpage.
3. After skip a raw HWPOISON subpage by lseek, read()s always succeed.
Acked-by: Mike Kravetz <mike.kravetz@oracle.com>
Reviewed-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
Signed-off-by: Jiaqi Yan <jiaqiyan@google.com>
---
tools/testing/selftests/mm/.gitignore | 1 +
tools/testing/selftests/mm/Makefile | 1 +
.../selftests/mm/hugetlb-read-hwpoison.c | 322 ++++++++++++++++++
3 files changed, 324 insertions(+)
create mode 100644 tools/testing/selftests/mm/hugetlb-read-hwpoison.c
diff --git a/tools/testing/selftests/mm/.gitignore b/tools/testing/selftests/mm/.gitignore
index 7e2a982383c0..cdc9ce4426b9 100644
--- a/tools/testing/selftests/mm/.gitignore
+++ b/tools/testing/selftests/mm/.gitignore
@@ -5,6 +5,7 @@ hugepage-mremap
hugepage-shm
hugepage-vmemmap
hugetlb-madvise
+hugetlb-read-hwpoison
khugepaged
map_hugetlb
map_populate
diff --git a/tools/testing/selftests/mm/Makefile b/tools/testing/selftests/mm/Makefile
index 66d7c07dc177..b7fce9073279 100644
--- a/tools/testing/selftests/mm/Makefile
+++ b/tools/testing/selftests/mm/Makefile
@@ -41,6 +41,7 @@ TEST_GEN_PROGS += gup_longterm
TEST_GEN_PROGS += gup_test
TEST_GEN_PROGS += hmm-tests
TEST_GEN_PROGS += hugetlb-madvise
+TEST_GEN_PROGS += hugetlb-read-hwpoison
TEST_GEN_PROGS += hugepage-mmap
TEST_GEN_PROGS += hugepage-mremap
TEST_GEN_PROGS += hugepage-shm
diff --git a/tools/testing/selftests/mm/hugetlb-read-hwpoison.c b/tools/testing/selftests/mm/hugetlb-read-hwpoison.c
new file mode 100644
index 000000000000..ba6cc6f9cabc
--- /dev/null
+++ b/tools/testing/selftests/mm/hugetlb-read-hwpoison.c
@@ -0,0 +1,322 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#define _GNU_SOURCE
+#include <stdlib.h>
+#include <stdio.h>
+#include <string.h>
+
+#include <linux/magic.h>
+#include <sys/mman.h>
+#include <sys/statfs.h>
+#include <errno.h>
+#include <stdbool.h>
+
+#include "../kselftest.h"
+
+#define PREFIX " ... "
+#define ERROR_PREFIX " !!! "
+
+#define MAX_WRITE_READ_CHUNK_SIZE (getpagesize() * 16)
+#define MAX(a, b) (((a) > (b)) ? (a) : (b))
+
+enum test_status {
+ TEST_PASSED = 0,
+ TEST_FAILED = 1,
+ TEST_SKIPPED = 2,
+};
+
+static char *status_to_str(enum test_status status)
+{
+ switch (status) {
+ case TEST_PASSED:
+ return "TEST_PASSED";
+ case TEST_FAILED:
+ return "TEST_FAILED";
+ case TEST_SKIPPED:
+ return "TEST_SKIPPED";
+ default:
+ return "TEST_???";
+ }
+}
+
+static int setup_filemap(char *filemap, size_t len, size_t wr_chunk_size)
+{
+ char iter = 0;
+
+ for (size_t offset = 0; offset < len;
+ offset += wr_chunk_size) {
+ iter++;
+ memset(filemap + offset, iter, wr_chunk_size);
+ }
+
+ return 0;
+}
+
+static bool verify_chunk(char *buf, size_t len, char val)
+{
+ size_t i;
+
+ for (i = 0; i < len; ++i) {
+ if (buf[i] != val) {
+ printf(PREFIX ERROR_PREFIX "check fail: buf[%lu] = %u != %u\n",
+ i, buf[i], val);
+ return false;
+ }
+ }
+
+ return true;
+}
+
+static bool seek_read_hugepage_filemap(int fd, size_t len, size_t wr_chunk_size,
+ off_t offset, size_t expected)
+{
+ char buf[MAX_WRITE_READ_CHUNK_SIZE];
+ ssize_t ret_count = 0;
+ ssize_t total_ret_count = 0;
+ char val = offset / wr_chunk_size + offset % wr_chunk_size;
+
+ printf(PREFIX PREFIX "init val=%u with offset=0x%lx\n", val, offset);
+ printf(PREFIX PREFIX "expect to read 0x%lx bytes of data in total\n",
+ expected);
+ if (lseek(fd, offset, SEEK_SET) < 0) {
+ perror(PREFIX ERROR_PREFIX "seek failed");
+ return false;
+ }
+
+ while (offset + total_ret_count < len) {
+ ret_count = read(fd, buf, wr_chunk_size);
+ if (ret_count == 0) {
+ printf(PREFIX PREFIX "read reach end of the file\n");
+ break;
+ } else if (ret_count < 0) {
+ perror(PREFIX ERROR_PREFIX "read failed");
+ break;
+ }
+ ++val;
+ if (!verify_chunk(buf, ret_count, val))
+ return false;
+
+ total_ret_count += ret_count;
+ }
+ printf(PREFIX PREFIX "actually read 0x%lx bytes of data in total\n",
+ total_ret_count);
+
+ return total_ret_count == expected;
+}
+
+static bool read_hugepage_filemap(int fd, size_t len,
+ size_t wr_chunk_size, size_t expected)
+{
+ char buf[MAX_WRITE_READ_CHUNK_SIZE];
+ ssize_t ret_count = 0;
+ ssize_t total_ret_count = 0;
+ char val = 0;
+
+ printf(PREFIX PREFIX "expect to read 0x%lx bytes of data in total\n",
+ expected);
+ while (total_ret_count < len) {
+ ret_count = read(fd, buf, wr_chunk_size);
+ if (ret_count == 0) {
+ printf(PREFIX PREFIX "read reach end of the file\n");
+ break;
+ } else if (ret_count < 0) {
+ perror(PREFIX ERROR_PREFIX "read failed");
+ break;
+ }
+ ++val;
+ if (!verify_chunk(buf, ret_count, val))
+ return false;
+
+ total_ret_count += ret_count;
+ }
+ printf(PREFIX PREFIX "actually read 0x%lx bytes of data in total\n",
+ total_ret_count);
+
+ return total_ret_count == expected;
+}
+
+static enum test_status
+test_hugetlb_read(int fd, size_t len, size_t wr_chunk_size)
+{
+ enum test_status status = TEST_SKIPPED;
+ char *filemap = NULL;
+
+ if (ftruncate(fd, len) < 0) {
+ perror(PREFIX ERROR_PREFIX "ftruncate failed");
+ return status;
+ }
+
+ filemap = mmap(NULL, len, PROT_READ | PROT_WRITE,
+ MAP_SHARED | MAP_POPULATE, fd, 0);
+ if (filemap == MAP_FAILED) {
+ perror(PREFIX ERROR_PREFIX "mmap for primary mapping failed");
+ goto done;
+ }
+
+ setup_filemap(filemap, len, wr_chunk_size);
+ status = TEST_FAILED;
+
+ if (read_hugepage_filemap(fd, len, wr_chunk_size, len))
+ status = TEST_PASSED;
+
+ munmap(filemap, len);
+done:
+ if (ftruncate(fd, 0) < 0) {
+ perror(PREFIX ERROR_PREFIX "ftruncate back to 0 failed");
+ status = TEST_FAILED;
+ }
+
+ return status;
+}
+
+static enum test_status
+test_hugetlb_read_hwpoison(int fd, size_t len, size_t wr_chunk_size,
+ bool skip_hwpoison_page)
+{
+ enum test_status status = TEST_SKIPPED;
+ char *filemap = NULL;
+ char *hwp_addr = NULL;
+ const unsigned long pagesize = getpagesize();
+
+ if (ftruncate(fd, len) < 0) {
+ perror(PREFIX ERROR_PREFIX "ftruncate failed");
+ return status;
+ }
+
+ filemap = mmap(NULL, len, PROT_READ | PROT_WRITE,
+ MAP_SHARED | MAP_POPULATE, fd, 0);
+ if (filemap == MAP_FAILED) {
+ perror(PREFIX ERROR_PREFIX "mmap for primary mapping failed");
+ goto done;
+ }
+
+ setup_filemap(filemap, len, wr_chunk_size);
+ status = TEST_FAILED;
+
+ /*
+ * Poisoned hugetlb page layout (assume hugepagesize=2MB):
+ * |<---------------------- 1MB ---------------------->|
+ * |<---- healthy page ---->|<---- HWPOISON page ----->|
+ * |<------------------- (1MB - 8KB) ----------------->|
+ */
+ hwp_addr = filemap + len / 2 + pagesize;
+ if (madvise(hwp_addr, pagesize, MADV_HWPOISON) < 0) {
+ perror(PREFIX ERROR_PREFIX "MADV_HWPOISON failed");
+ goto unmap;
+ }
+
+ if (!skip_hwpoison_page) {
+ /*
+ * Userspace should be able to read (1MB + 1 page) from
+ * the beginning of the HWPOISONed hugepage.
+ */
+ if (read_hugepage_filemap(fd, len, wr_chunk_size,
+ len / 2 + pagesize))
+ status = TEST_PASSED;
+ } else {
+ /*
+ * Userspace should be able to read (1MB - 2 pages) from
+ * HWPOISONed hugepage.
+ */
+ if (seek_read_hugepage_filemap(fd, len, wr_chunk_size,
+ len / 2 + MAX(2 * pagesize, wr_chunk_size),
+ len / 2 - MAX(2 * pagesize, wr_chunk_size)))
+ status = TEST_PASSED;
+ }
+
+unmap:
+ munmap(filemap, len);
+done:
+ if (ftruncate(fd, 0) < 0) {
+ perror(PREFIX ERROR_PREFIX "ftruncate back to 0 failed");
+ status = TEST_FAILED;
+ }
+
+ return status;
+}
+
+static int create_hugetlbfs_file(struct statfs *file_stat)
+{
+ int fd;
+
+ fd = memfd_create("hugetlb_tmp", MFD_HUGETLB);
+ if (fd < 0) {
+ perror(PREFIX ERROR_PREFIX "could not open hugetlbfs file");
+ return -1;
+ }
+
+ memset(file_stat, 0, sizeof(*file_stat));
+ if (fstatfs(fd, file_stat)) {
+ perror(PREFIX ERROR_PREFIX "fstatfs failed");
+ goto close;
+ }
+ if (file_stat->f_type != HUGETLBFS_MAGIC) {
+ printf(PREFIX ERROR_PREFIX "not hugetlbfs file\n");
+ goto close;
+ }
+
+ return fd;
+close:
+ close(fd);
+ return -1;
+}
+
+int main(void)
+{
+ int fd;
+ struct statfs file_stat;
+ enum test_status status;
+ /* Test read() in different granularity. */
+ size_t wr_chunk_sizes[] = {
+ getpagesize() / 2, getpagesize(),
+ getpagesize() * 2, getpagesize() * 4
+ };
+ size_t i;
+
+ for (i = 0; i < ARRAY_SIZE(wr_chunk_sizes); ++i) {
+ printf("Write/read chunk size=0x%lx\n",
+ wr_chunk_sizes[i]);
+
+ fd = create_hugetlbfs_file(&file_stat);
+ if (fd < 0)
+ goto create_failure;
+ printf(PREFIX "HugeTLB read regression test...\n");
+ status = test_hugetlb_read(fd, file_stat.f_bsize,
+ wr_chunk_sizes[i]);
+ printf(PREFIX "HugeTLB read regression test...%s\n",
+ status_to_str(status));
+ close(fd);
+ if (status == TEST_FAILED)
+ return -1;
+
+ fd = create_hugetlbfs_file(&file_stat);
+ if (fd < 0)
+ goto create_failure;
+ printf(PREFIX "HugeTLB read HWPOISON test...\n");
+ status = test_hugetlb_read_hwpoison(fd, file_stat.f_bsize,
+ wr_chunk_sizes[i], false);
+ printf(PREFIX "HugeTLB read HWPOISON test...%s\n",
+ status_to_str(status));
+ close(fd);
+ if (status == TEST_FAILED)
+ return -1;
+
+ fd = create_hugetlbfs_file(&file_stat);
+ if (fd < 0)
+ goto create_failure;
+ printf(PREFIX "HugeTLB seek then read HWPOISON test...\n");
+ status = test_hugetlb_read_hwpoison(fd, file_stat.f_bsize,
+ wr_chunk_sizes[i], true);
+ printf(PREFIX "HugeTLB seek then read HWPOISON test...%s\n",
+ status_to_str(status));
+ close(fd);
+ if (status == TEST_FAILED)
+ return -1;
+ }
+
+ return 0;
+
+create_failure:
+ printf(ERROR_PREFIX "Abort test: failed to create hugetlbfs file\n");
+ return -1;
+}
--
2.41.0.255.g8b1d071c50-goog
^ permalink raw reply [flat|nested] 17+ messages in thread