From: Jiaqi Yan <jiaqiyan@google.com>
To: naoya.horiguchi@nec.com, muchun.song@linux.dev, linmiaohe@huawei.com
Cc: akpm@linux-foundation.org, shuah@kernel.org, corbet@lwn.net,
osalvador@suse.de, rientjes@google.com, duenwen@google.com,
fvdl@google.com, linux-mm@kvack.org,
linux-kselftest@vger.kernel.org, linux-doc@vger.kernel.org,
Jane Chu <jane.chu@oracle.com>
Subject: Re: [PATCH v1 1/3] mm/memory-failure: userspace controls soft-offlining hugetlb pages
Date: Fri, 7 Jun 2024 15:26:16 -0700 [thread overview]
Message-ID: <CACw3F50+ZhetCbeym3fDzKQr8d+HY7WXNRYUD5jh4_gTUWWEig@mail.gmail.com> (raw)
In-Reply-To: <20240531213439.2958891-2-jiaqiyan@google.com>
+CC Jane.
On Fri, May 31, 2024 at 2:34 PM Jiaqi Yan <jiaqiyan@google.com> wrote:
>
> Correctable memory errors are very common on servers with large
> amount of memory, and are corrected by ECC. Soft offline is kernel's
> additional recovery handling for memory pages having (excessive)
> corrected memory errors. Impacted page is migrated to a healthy page
> if mapped/inuse; the original page is discarded for any future use.
>
> The actual policy on whether (and when) to soft offline should be
> maintained by userspace, especially in case of HugeTLB hugepages.
> Soft-offline dissolves a hugepage, either in-use or free, into
> chunks of 4K pages, reducing HugeTLB pool capacity by 1 hugepage.
> If userspace has not acknowledged such behavior, it may be surprised
> when later mmap hugepages MAP_FAILED due to lack of hugepages.
> In addition, discarding the entire 1G memory page only because of
> corrected memory errors sounds very costly and kernel better not
> doing under the hood. But today there are at least 2 such cases:
> 1. GHES driver sees both GHES_SEV_CORRECTED and
> CPER_SEC_ERROR_THRESHOLD_EXCEEDED after parsing CPER.
> 2. RAS Correctable Errors Collector counts correctable errors per
> PFN and when the counter for a PFN reaches threshold
> In both cases, userspace has no control of the soft offline performed
> by kernel's memory failure recovery.
>
> This commit gives userspace the control of soft-offlining HugeTLB
> pages: kernel only soft offlines hugepage if userspace has opt-ed in
> in for that specific hugepage size. The interface to userspace is a
> new sysfs entry called softoffline_corrected_errors under the
> /sys/kernel/mm/hugepages/hugepages-${size}kB directory:
> * When softoffline_corrected_errors=0, skip soft offlining for all
> hugepages of size ${size}kB.
> * When softoffline_corrected_errors=1, soft offline as before this
> patch series.
>
> So the granularity of the control is per hugepage size, and is kept
> in corresponding hstate. By default softoffline_corrected_errors is
> 1 to preserve existing behavior in kernel.
>
> Signed-off-by: Jiaqi Yan <jiaqiyan@google.com>
> ---
> include/linux/hugetlb.h | 17 +++++++++++++++++
> mm/hugetlb.c | 34 ++++++++++++++++++++++++++++++++++
> mm/memory-failure.c | 7 +++++++
> 3 files changed, 58 insertions(+)
>
> diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
> index 2b3c3a404769..55f9e9593cce 100644
> --- a/include/linux/hugetlb.h
> +++ b/include/linux/hugetlb.h
> @@ -685,6 +685,7 @@ struct hstate {
> int next_nid_to_free;
> unsigned int order;
> unsigned int demote_order;
> + unsigned int softoffline_corrected_errors;
> unsigned long mask;
> unsigned long max_huge_pages;
> unsigned long nr_huge_pages;
> @@ -1029,6 +1030,16 @@ void hugetlb_unregister_node(struct node *node);
> */
> bool is_raw_hwpoison_page_in_hugepage(struct page *page);
>
> +/*
> + * For certain hugepage size, when a hugepage has corrected memory error(s):
> + * - Return 0 if userspace wants to disable soft offlining the hugepage.
> + * - Return > 0 if userspace allows soft offlining the hugepage.
> + */
> +static inline int hugetlb_softoffline_corrected_errors(struct folio *folio)
> +{
> + return folio_hstate(folio)->softoffline_corrected_errors;
> +}
> +
> #else /* CONFIG_HUGETLB_PAGE */
> struct hstate {};
>
> @@ -1226,6 +1237,12 @@ static inline bool hugetlbfs_pagecache_present(
> {
> return false;
> }
> +
> +static inline int hugetlb_softoffline_corrected_errors(struct folio *folio)
> +{
> + return 1;
> +}
> +
> #endif /* CONFIG_HUGETLB_PAGE */
>
> static inline spinlock_t *huge_pte_lock(struct hstate *h,
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index 6be78e7d4f6e..a184e28ce592 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -4325,6 +4325,38 @@ static ssize_t demote_size_store(struct kobject *kobj,
> }
> HSTATE_ATTR(demote_size);
>
> +static ssize_t softoffline_corrected_errors_show(struct kobject *kobj,
> + struct kobj_attribute *attr,
> + char *buf)
> +{
> + struct hstate *h = kobj_to_hstate(kobj, NULL);
> +
> + return sysfs_emit(buf, "%d\n", h->softoffline_corrected_errors);
> +}
> +
> +static ssize_t softoffline_corrected_errors_store(struct kobject *kobj,
> + struct kobj_attribute *attr,
> + const char *buf,
> + size_t count)
> +{
> + int err;
> + unsigned long input;
> + struct hstate *h = kobj_to_hstate(kobj, NULL);
> +
> + err = kstrtoul(buf, 10, &input);
> + if (err)
> + return err;
> +
> + /* softoffline_corrected_errors is either 0 or 1. */
> + if (input > 1)
> + return -EINVAL;
> +
> + h->softoffline_corrected_errors = input;
> +
> + return count;
> +}
> +HSTATE_ATTR(softoffline_corrected_errors);
> +
> static struct attribute *hstate_attrs[] = {
> &nr_hugepages_attr.attr,
> &nr_overcommit_hugepages_attr.attr,
> @@ -4334,6 +4366,7 @@ static struct attribute *hstate_attrs[] = {
> #ifdef CONFIG_NUMA
> &nr_hugepages_mempolicy_attr.attr,
> #endif
> + &softoffline_corrected_errors_attr.attr,
> NULL,
> };
>
> @@ -4655,6 +4688,7 @@ void __init hugetlb_add_hstate(unsigned int order)
> h = &hstates[hugetlb_max_hstate++];
> mutex_init(&h->resize_lock);
> h->order = order;
> + h->softoffline_corrected_errors = 1;
> h->mask = ~(huge_page_size(h) - 1);
> for (i = 0; i < MAX_NUMNODES; ++i)
> INIT_LIST_HEAD(&h->hugepage_freelists[i]);
> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
> index 16ada4fb02b7..7094fc4c62e2 100644
> --- a/mm/memory-failure.c
> +++ b/mm/memory-failure.c
> @@ -2776,6 +2776,13 @@ int soft_offline_page(unsigned long pfn, int flags)
> return -EIO;
> }
>
> + if (PageHuge(page) &&
> + !hugetlb_softoffline_corrected_errors(page_folio(page))) {
> + pr_info("soft offline: %#lx: hugetlb page is ignored\n", pfn);
> + put_ref_page(pfn, flags);
> + return -EINVAL;
> + }
> +
> mutex_lock(&mf_mutex);
>
> if (PageHWPoison(page)) {
> --
> 2.45.1.288.g0e0cd299f1-goog
>
next prev parent reply other threads:[~2024-06-07 22:26 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-05-31 21:34 [PATCH v1 0/3] Userspace controls soft-offline HugeTLB pages Jiaqi Yan
2024-05-31 21:34 ` [PATCH v1 1/3] mm/memory-failure: userspace controls soft-offlining hugetlb pages Jiaqi Yan
2024-06-07 22:26 ` Jiaqi Yan [this message]
2024-05-31 21:34 ` [PATCH v1 2/3] selftest/mm: test softoffline_corrected_errors behaviors Jiaqi Yan
2024-05-31 21:34 ` [PATCH v1 3/3] docs: hugetlbpage.rst: add softoffline_corrected_errors Jiaqi Yan
2024-06-04 7:19 ` [PATCH v1 0/3] Userspace controls soft-offline HugeTLB pages Miaohe Lin
2024-06-07 22:22 ` Jiaqi Yan
2024-06-10 19:41 ` Jane Chu
2024-06-10 22:55 ` Jiaqi Yan
2024-06-11 17:55 ` Jane Chu
2024-06-11 18:12 ` Jiaqi Yan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CACw3F50+ZhetCbeym3fDzKQr8d+HY7WXNRYUD5jh4_gTUWWEig@mail.gmail.com \
--to=jiaqiyan@google.com \
--cc=akpm@linux-foundation.org \
--cc=corbet@lwn.net \
--cc=duenwen@google.com \
--cc=fvdl@google.com \
--cc=jane.chu@oracle.com \
--cc=linmiaohe@huawei.com \
--cc=linux-doc@vger.kernel.org \
--cc=linux-kselftest@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=muchun.song@linux.dev \
--cc=naoya.horiguchi@nec.com \
--cc=osalvador@suse.de \
--cc=rientjes@google.com \
--cc=shuah@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox