linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Jiaqi Yan <jiaqiyan@google.com>
To: kirill.shutemov@linux.intel.com, kirill@shutemov.name,
	shy828301@gmail.com,  tongtiangen@huawei.com,
	tony.luck@intel.com
Cc: naoya.horiguchi@nec.com, linmiaohe@huawei.com,
	linux-mm@kvack.org,  akpm@linux-foundation.org,
	osalvador@suse.de, wangkefeng.wang@huawei.com,
	 stevensd@chromium.org, hughd@google.com
Subject: Re: [PATCH v12 0/3] Memory poison recovery in khugepaged collapsing
Date: Tue, 4 Apr 2023 11:44:16 -0700	[thread overview]
Message-ID: <CACw3F52kiTnfJBwJ5-Jq5pnqpHTD4i+cRbM-a7LSPHmmpLUjDA@mail.gmail.com> (raw)
In-Reply-To: <20230329151121.949896-1-jiaqiyan@google.com>

[-- Attachment #1: Type: text/plain, Size: 7737 bytes --]

Friendly ping for review :)

On Wed, Mar 29, 2023 at 8:11 AM Jiaqi Yan <jiaqiyan@google.com> wrote:

> Problem
> =======
> Memory DIMMs are subject to multi-bit flips, i.e. memory errors.
> As memory size and density increase, the chances of and number of
> memory errors increase. The increasing size and density of server
> RAM in the data center and cloud have shown increased uncorrectable
> memory errors. There are already mechanisms in the kernel to recover
> from uncorrectable memory errors. This series of patches provides
> the recovery mechanism for the particular kernel agent khugepaged
> when it collapses memory pages.
>
> Impact
> ======
> The main reason we chose to make khugepaged collapsing tolerant of
> memory failures was its high possibility of accessing poisoned memory
> while performing functionally optional compaction actions.
> Standard applications typically don't have strict requirements on
> the size of its pages. So they are given 4K pages by the kernel.
> The kernel is able to improve application performance by either
>
>   1) giving applications 2M pages to begin with, or
>   2) collapsing 4K pages into 2M pages when possible.
>
> This collapsing operation is done by khugepaged, a kernel agent that
> is constantly scanning memory. When collapsing 4K pages into a 2M page,
> it must copy the data from the 4K pages into a physically contiguous
> 2M page. Therefore, as long as there exists one poisoned cache line in
> collapsible 4K pages, khugepaged will eventually access it. The current
> impact to users is a machine check exception triggered kernel panic.
> However, khugepaged’s compaction operations are not functionally required
> kernel actions. Therefore making khugepaged tolerant to poisoned memory
> will greatly improve user experience.
>
> This patch series is for cases where khugepaged is the first guy
> that detects the memory errors on the poisoned pages. IOW, the pages
> are not known to have memory errors when khugepaged collapsing gets to
> them. In our observation, this happens frequently when the huge page
> ratio of the system is relatively low, which is fairly common in
> virtual machines running on cloud.
>
> Solution
> ========
> As stated before, it is less desirable to crash the system only because
> khugepaged accesses poisoned pages while it is collapsing 4K pages.
> The high level idea of this patch series is to skip the group of pages
> (usually 512 4K-size pages) once khugepaged finds one of them is poisoned,
> as these pages have become ineligible to be collapsed.
>
> We are also careful to unwind operations khuagepaged has performed before
> it detects memory failures. For example, before copying and collapsing
> a group of anonymous pages into a huge page, the source pages will be
> isolated and their page table is unlinked from their PMD. These operations
> need to be undone in order to ensure these pages are not changed/lost from
> the perspective of other threads (both user and kernel space). As for
> file backed memory pages, there already exists a rollback case. This
> patch just extends it so that khugepaged also correctly rolls back when
> it fails to copy poisoned 4K pages.
>
> Changelog
> =========
> v12 changes
> - Incorporate feedbacks from Shi Yang <shy828301@gmail.com>.
> - Drop unused pmd from __collapse_huge_page_copy_succeeded.
> - Drop unused address from __collapse_huge_page_copy_failed.
> - smp_mb() should be after filemap_nr_thps_dec.
> - This revision is rebased to mm-unstable at commit 9b175ce664d33
>   ("mm: move free_area_empty() to mm/internal.h")
>
> v11 changes
> - Incorporate feedbacks from Shi Yang <shy828301@gmail.com> and Hugh
>   Dickins <hughd@google.com>
> - Replace releasing pages for-loop with release_pte_pages in
>   __collapse_huge_page_copy_failed.
> - Rename pte_ptl to ptl in __collapse_huge_page_copy_succeeded.
> - Fix a bug in __collapse_huge_page_copy_succeeded: ptep_clear should be
>   used instead of pte_clear.
> - Drop _address in __collapse_huge_page_copy_succeeded.
> - Add smp_mb() before updating filemap_nr_thps_dec.
> - Move `nr = thp_nr_pages()` closer to its references.
> - Remove an unnecessary goto statement.
> - This revision is rebased to mm-unstable at commit b4e1277ee31db
>   ("xtensa: reword ARCH_FORCE_MAX_ORDER prompt and help text")
>
> v10 changes
> - Incorporate feedbacks from Kirill A. Shutemov
>   <kirill.shutemov@linux.intel.com>
> - Refactor the 2nd loop (after the loop for copying memory) into 2 helper
>   functions, one for actions to take when copying succeeded, one for when
>   copying failed due to #MC.
> - Use copy_mc_user_highpage for anonymous memory.
> - Introduce copy_mc_highpage and use it for file-backed memory.
> - Rename the original PMD from `rollback` to `orig_pmd`.
> - Some minor changes in comments, e.g. `normal page` to `raw page`.
> - This revision is rebased to mm-unstable at commit df3ae4347aff9
>   ("dma-buf: system_heap: avoid reclaim for order 4")
>
> v9 changes
> - Incorporate feedback from Andrew Morton <akpm@linux-foundation.org>
> - Move copy_mc_highpage into khugepage.c as a static out-of-line
>   function copy_mc_page.
>
> v8 changes
> - Incorporate feedbacks from Tony Luck <tony.luck@intel.com>
> - Rename copy_highpage_mc to copy_mc_highpage.
> - Update copy_mc_highpage with kmsan changes.
> - Code style changes:
>   1) copy_mc_highpage returns int as "copy" is an action and is consistent
>      with copy_mc_user_highpage.
>   2) __collapse_huge_page_copy returns scan_result(int) and is consistent
>      with __collapse_huge_page_isolate/swapin.
>   3) variables are declared in separate lines in collapse_file.
>
> v7 changes
> - Fix a bug "KASAN: stack-out-of-bounds Read in collapse_file". After
>   copying all pages into the huge page, clear_highpage should use index
>   instead of page->index.
>
> v6 changes
> - Address comments from Kirill Shutemov <kirill@shutemov.name>
> - Rewrite __collapse_huge_page_copy to make rollback operations more
>   clear to its reader.
> - Add detailed test steps in each commit message.
>
> v5 changes
> - Rebase patches to mm-unstable at
>   commit ffb39098bf87 ("Merge tag 'linux-kselftest-kunit-6.1-rc1' of
>   git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest").
> - Resolves conflicts with:
>   commit 2f55f070e5b8 ("mm/khugepaged: minor cleanup for collapse_file")
>   commit 1baec203b77c ("mm/khugepaged: try to free transhuge swapcache
>   when possible")
>
> v4 changes
> - Incorporate feedbacks from Yang Shi <shy828301@gmail.com>
> - Remove tracepoint for __collapse_huge_page_copy, just keep SCAN_COPY_MC
>   and let trace_mm_collapse_huge_page it
> - Remove unnecessary comments
>
> v3 changes
> - Incorporate feedbacks from Yang Shi <shy828301@gmail.com>
> - Add tracepoint for __collapse_huge_page_copy
> - Restore PMD in collapse_huge_page
> - Correct comment about mmap_read_lock
>
> v2 changes
> - Incorporate feedbacks from Yang Shi <shy828301@gmail.com>
> - Only keep copy_highpage_mc
> - Adding new scan_result SCAN_COPY_MC
> - Defer NR_FILE_THPS update until copying succeeded
>
> Jiaqi Yan (3):
>   mm/khugepaged: recover from poisoned anonymous memory
>   mm/hwpoison: introduce copy_mc_highpage
>   mm/khugepaged: recover from poisoned file-backed memory
>
>  include/linux/highmem.h            |  54 ++++++--
>  include/trace/events/huge_memory.h |   3 +-
>  mm/khugepaged.c                    | 200 ++++++++++++++++++++++-------
>  3 files changed, 198 insertions(+), 59 deletions(-)
>
> --
> 2.40.0.348.gf938b09366-goog
>
>

[-- Attachment #2: Type: text/html, Size: 9270 bytes --]

  parent reply	other threads:[~2023-04-04 18:44 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-03-29 15:11 Jiaqi Yan
2023-03-29 15:11 ` [PATCH v12 1/3] mm/khugepaged: recover from poisoned anonymous memory Jiaqi Yan
2023-03-29 15:11 ` [PATCH v12 2/3] mm/hwpoison: introduce copy_mc_highpage Jiaqi Yan
2023-03-29 15:11 ` [PATCH v12 3/3] mm/khugepaged: recover from poisoned file-backed memory Jiaqi Yan
2023-04-04 18:44 ` Jiaqi Yan [this message]
2023-04-04 19:07   ` [PATCH v12 0/3] Memory poison recovery in khugepaged collapsing Andrew Morton
2023-04-05  3:57   ` Yang Shi
2023-04-06 18:12     ` Jiaqi Yan
2023-04-06 21:56       ` Andrew Morton

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CACw3F52kiTnfJBwJ5-Jq5pnqpHTD4i+cRbM-a7LSPHmmpLUjDA@mail.gmail.com \
    --to=jiaqiyan@google.com \
    --cc=akpm@linux-foundation.org \
    --cc=hughd@google.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=kirill@shutemov.name \
    --cc=linmiaohe@huawei.com \
    --cc=linux-mm@kvack.org \
    --cc=naoya.horiguchi@nec.com \
    --cc=osalvador@suse.de \
    --cc=shy828301@gmail.com \
    --cc=stevensd@chromium.org \
    --cc=tongtiangen@huawei.com \
    --cc=tony.luck@intel.com \
    --cc=wangkefeng.wang@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox