From: "Kirill A. Shutemov" <kirill@shutemov.name>
To: Jiaqi Yan <jiaqiyan@google.com>
Cc: shy828301@gmail.com, tongtiangen@huawei.com, tony.luck@intel.com,
naoya.horiguchi@nec.com, kirill.shutemov@linux.intel.com,
linmiaohe@huawei.com, linux-mm@kvack.org,
akpm@linux-foundation.org
Subject: Re: [PATCH v5 1/2] mm/khugepaged: recover from poisoned anonymous memory
Date: Wed, 12 Oct 2022 02:59:18 +0300 [thread overview]
Message-ID: <20221011235918.hvefriya4m3qdhr2@box.shutemov.name> (raw)
In-Reply-To: <20221010160142.1087120-2-jiaqiyan@google.com>
On Mon, Oct 10, 2022 at 09:01:41AM -0700, Jiaqi Yan wrote:
> Make __collapse_huge_page_copy return whether
> collapsing/copying anonymous pages succeeded,
> and make collapse_huge_page handle the return status.
>
> Break existing PTE scan loop into two for-loops.
> The first loop copies source pages into target huge page,
> and can fail gracefully when running into memory errors in
> source pages. If copying all pages succeeds, the second loop
> releases and clears up these normal pages.
> Otherwise, the second loop does the following to
> roll back the page table and page states:
> 1) re-establish the original PTEs-to-PMD connection.
> 2) release source pages back to their LRU list.
>
> Signed-off-by: Jiaqi Yan <jiaqiyan@google.com>
> ---
> include/linux/highmem.h | 19 +++++
> include/trace/events/huge_memory.h | 3 +-
> mm/khugepaged.c | 130 ++++++++++++++++++++++-------
> 3 files changed, 121 insertions(+), 31 deletions(-)
>
> diff --git a/include/linux/highmem.h b/include/linux/highmem.h
> index 25679035ca283..91a65bdabcb33 100644
> --- a/include/linux/highmem.h
> +++ b/include/linux/highmem.h
> @@ -332,6 +332,25 @@ static inline void copy_highpage(struct page *to, struct page *from)
>
> #endif
>
> +/*
> + * Machine check exception handled version of copy_highpage.
> + * Return true if copying page content failed; otherwise false.
> + * Note handling #MC requires arch opt-in.
> + */
> +static inline bool copy_highpage_mc(struct page *to, struct page *from)
> +{
> + char *vfrom, *vto;
> + unsigned long ret;
> +
> + vfrom = kmap_local_page(from);
> + vto = kmap_local_page(to);
> + ret = copy_mc_to_kernel(vto, vfrom, PAGE_SIZE);
> + kunmap_local(vto);
> + kunmap_local(vfrom);
> +
> + return ret > 0;
> +}
> +
> static inline void memcpy_page(struct page *dst_page, size_t dst_off,
> struct page *src_page, size_t src_off,
> size_t len)
> diff --git a/include/trace/events/huge_memory.h b/include/trace/events/huge_memory.h
> index d651f3437367d..756e991366639 100644
> --- a/include/trace/events/huge_memory.h
> +++ b/include/trace/events/huge_memory.h
> @@ -33,7 +33,8 @@
> EM( SCAN_ALLOC_HUGE_PAGE_FAIL, "alloc_huge_page_failed") \
> EM( SCAN_CGROUP_CHARGE_FAIL, "ccgroup_charge_failed") \
> EM( SCAN_TRUNCATED, "truncated") \
> - EMe(SCAN_PAGE_HAS_PRIVATE, "page_has_private") \
> + EM( SCAN_PAGE_HAS_PRIVATE, "page_has_private") \
> + EMe(SCAN_COPY_MC, "copy_poisoned_page") \
>
> #undef EM
> #undef EMe
> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> index 70b7ac66411c0..552e7cb4c8b42 100644
> --- a/mm/khugepaged.c
> +++ b/mm/khugepaged.c
> @@ -51,6 +51,7 @@ enum scan_result {
> SCAN_CGROUP_CHARGE_FAIL,
> SCAN_TRUNCATED,
> SCAN_PAGE_HAS_PRIVATE,
> + SCAN_COPY_MC,
> };
>
> #define CREATE_TRACE_POINTS
> @@ -673,44 +674,99 @@ static int __collapse_huge_page_isolate(struct vm_area_struct *vma,
> return 0;
> }
>
> -static void __collapse_huge_page_copy(pte_t *pte, struct page *page,
> - struct vm_area_struct *vma,
> - unsigned long address,
> - spinlock_t *ptl,
> - struct list_head *compound_pagelist)
> +/*
> + * __collapse_huge_page_copy - attempts to copy memory contents from normal
> + * pages to a hugepage. Cleanup the normal pages if copying succeeds;
> + * otherwise restore the original page table and release isolated normal pages.
> + * Returns true if copying succeeds, otherwise returns false.
> + *
> + * @pte: starting of the PTEs to copy from
> + * @page: the new hugepage to copy contents to
> + * @pmd: pointer to the new hugepage's PMD
> + * @rollback: the original normal pages' PMD
> + * @vma: the original normal pages' virtual memory area
> + * @address: starting address to copy
> + * @pte_ptl: lock on normal pages' PTEs
> + * @compound_pagelist: list that stores compound pages
> + */
> +static bool __collapse_huge_page_copy(pte_t *pte,
> + struct page *page,
> + pmd_t *pmd,
> + pmd_t rollback,
> + struct vm_area_struct *vma,
> + unsigned long address,
> + spinlock_t *pte_ptl,
> + struct list_head *compound_pagelist)
> {
> struct page *src_page, *tmp;
> pte_t *_pte;
> - for (_pte = pte; _pte < pte + HPAGE_PMD_NR;
> - _pte++, page++, address += PAGE_SIZE) {
> - pte_t pteval = *_pte;
> + pte_t pteval;
> + unsigned long _address;
> + spinlock_t *pmd_ptl;
> + bool copy_succeeded = true;
>
> + /*
> + * Copying pages' contents is subject to memory poison at any iteration.
> + */
> + for (_pte = pte, _address = address;
> + _pte < pte + HPAGE_PMD_NR;
> + _pte++, page++, _address += PAGE_SIZE) {
> + pteval = *_pte;
> +
> + if (pte_none(pteval) || is_zero_pfn(pte_pfn(pteval)))
> + clear_user_highpage(page, _address);
> + else {
> + src_page = pte_page(pteval);
> + if (copy_highpage_mc(page, src_page)) {
> + copy_succeeded = false;
> + break;
> + }
> + }
> + }
> +
> + if (!copy_succeeded) {
> + /*
> + * Copying failed, re-establish the regular PMD that points to
> + * the regular page table. Restoring PMD needs to be done prior
> + * to releasing pages. Since pages are still isolated and locked
> + * here, acquiring anon_vma_lock_write is unnecessary.
> + */
> + pmd_ptl = pmd_lock(vma->vm_mm, pmd);
> + pmd_populate(vma->vm_mm, pmd, pmd_pgtable(rollback));
> + spin_unlock(pmd_ptl);
> + }
Initially I expected return here, but you handle copy_succeeded below. Hm.
> +
> + for (_pte = pte, _address = address; _pte < pte + HPAGE_PMD_NR;
> + _pte++, _address += PAGE_SIZE) {
> + pteval = *_pte;
> if (pte_none(pteval) || is_zero_pfn(pte_pfn(pteval))) {
> - clear_user_highpage(page, address);
> - add_mm_counter(vma->vm_mm, MM_ANONPAGES, 1);
> - if (is_zero_pfn(pte_pfn(pteval))) {
> - /*
> - * ptl mostly unnecessary.
> - */
> - spin_lock(ptl);
> - ptep_clear(vma->vm_mm, address, _pte);
> - spin_unlock(ptl);
> + if (copy_succeeded) {
> + add_mm_counter(vma->vm_mm, MM_ANONPAGES, 1);
> + if (is_zero_pfn(pte_pfn(pteval))) {
> + /*
> + * pte_ptl mostly unnecessary.
> + */
> + spin_lock(pte_ptl);
> + pte_clear(vma->vm_mm, _address, _pte);
> + spin_unlock(pte_ptl);
> + }
> }
So this branch is NOP if !copy_succeeded, right?
> } else {
> src_page = pte_page(pteval);
> - copy_user_highpage(page, src_page, address, vma);
> if (!PageCompound(src_page))
> release_pte_page(src_page);
And this branch only calls release_pte_page() (Which I believe would screw
up statistic).
Looks very broken. Or just hard to follow. Or both.
Please consider rework code more to streamline handling !copy_succeeded
situation.
> - /*
> - * ptl mostly unnecessary, but preempt has to
> - * be disabled to update the per-cpu stats
> - * inside page_remove_rmap().
> - */
> - spin_lock(ptl);
> - ptep_clear(vma->vm_mm, address, _pte);
> - page_remove_rmap(src_page, vma, false);
> - spin_unlock(ptl);
> - free_page_and_swap_cache(src_page);
> + if (copy_succeeded) {
> + /*
> + * pte_ptl mostly unnecessary, but preempt
> + * has to be disabled to update the per-cpu
> + * stats inside page_remove_rmap().
> + */
> + spin_lock(pte_ptl);
> + ptep_clear(vma->vm_mm, _address, _pte);
> + page_remove_rmap(src_page, vma, false);
> + spin_unlock(pte_ptl);
> + free_page_and_swap_cache(src_page);
> + }
> }
> }
>
> @@ -723,6 +779,8 @@ static void __collapse_huge_page_copy(pte_t *pte, struct page *page,
> free_swap_cache(src_page);
> putback_lru_page(src_page);
> }
> +
> + return copy_succeeded;
> }
>
> static void khugepaged_alloc_sleep(void)
> @@ -1009,6 +1067,7 @@ static void collapse_huge_page(struct mm_struct *mm,
> struct vm_area_struct *vma;
> struct mmu_notifier_range range;
> gfp_t gfp;
> + bool copied = false;
>
> VM_BUG_ON(address & ~HPAGE_PMD_MASK);
>
> @@ -1121,9 +1180,13 @@ static void collapse_huge_page(struct mm_struct *mm,
> */
> anon_vma_unlock_write(vma->anon_vma);
>
> - __collapse_huge_page_copy(pte, new_page, vma, address, pte_ptl,
> - &compound_pagelist);
> + copied = __collapse_huge_page_copy(pte, new_page, pmd, _pmd,
> + vma, address, pte_ptl, &compound_pagelist);
> pte_unmap(pte);
> + if (!copied) {
> + result = SCAN_COPY_MC;
> + goto out_up_write;
> + }
> /*
> * spin_lock() below is not the equivalent of smp_wmb(), but
> * the smp_wmb() inside __SetPageUptodate() can be reused to
> @@ -2129,6 +2192,13 @@ static unsigned int khugepaged_scan_mm_slot(unsigned int pages,
> khugepaged_scan_file(mm, file, pgoff, hpage);
> fput(file);
> } else {
> + /*
> + * mmap_read_lock is
> + * 1) still held if scan failed;
> + * 2) released if scan succeeded.
> + * It is not affected by collapsing or copying
> + * operations.
> + */
> ret = khugepaged_scan_pmd(mm, vma,
> khugepaged_scan.address,
> hpage);
> --
> 2.38.0.rc1.362.ged0d419d3c-goog
>
>
--
Kiryl Shutsemau / Kirill A. Shutemov
next prev parent reply other threads:[~2022-10-11 23:59 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-10-10 16:01 [PATCH v5 0/2] Memory poison recovery in khugepaged collapsing Jiaqi Yan
2022-10-10 16:01 ` [PATCH v5 1/2] mm/khugepaged: recover from poisoned anonymous memory Jiaqi Yan
2022-10-11 23:59 ` Kirill A. Shutemov [this message]
2022-10-14 18:28 ` Jiaqi Yan
2022-10-31 15:31 ` Kirill A. Shutemov
2022-10-10 16:01 ` [PATCH v5 2/2] mm/khugepaged: recover from poisoned file-backed memory Jiaqi Yan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20221011235918.hvefriya4m3qdhr2@box.shutemov.name \
--to=kirill@shutemov.name \
--cc=akpm@linux-foundation.org \
--cc=jiaqiyan@google.com \
--cc=kirill.shutemov@linux.intel.com \
--cc=linmiaohe@huawei.com \
--cc=linux-mm@kvack.org \
--cc=naoya.horiguchi@nec.com \
--cc=shy828301@gmail.com \
--cc=tongtiangen@huawei.com \
--cc=tony.luck@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox