Re: [PATCH] zswap: do not crash the kernel on decompression failure

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Nhat Pham <nphamcs@gmail.com>
To: Yosry Ahmed <yosry.ahmed@linux.dev>
Cc: Johannes Weiner <hannes@cmpxchg.org>,
	akpm@linux-foundation.org, chengming.zhou@linux.dev,
	 linux-mm@kvack.org, kernel-team@meta.com,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH] zswap: do not crash the kernel on decompression failure
Date: Wed, 26 Feb 2025 15:20:13 -0800	[thread overview]
Message-ID: <CAKEwX=PR3tJM4X00hSua-w-FNR_ZwQ1oRqdT2Cgj_FV9cCUing@mail.gmail.com> (raw)
In-Reply-To: <Z780VzBOE3LKY0yi@google.com>

On Wed, Feb 26, 2025 at 7:33 AM Yosry Ahmed <yosry.ahmed@linux.dev> wrote:
>
> On Tue, Feb 25, 2025 at 11:57:27PM -0500, Johannes Weiner wrote:
> > On Wed, Feb 26, 2025 at 03:12:35AM +0000, Yosry Ahmed wrote:
> > > On Tue, Feb 25, 2025 at 01:32:00PM -0800, Nhat Pham wrote:
> > > > Currently, we crash the kernel when a decompression failure occurs in
> > > > zswap (either because of memory corruption, or a bug in the compression
> > > > algorithm). This is overkill. We should only SIGBUS the unfortunate
> > > > process asking for the zswap entry on zswap load, and skip the corrupted
> > > > entry in zswap writeback.
> > >
> > > Some relevant observations/questions, but not really actionable for this
> > > patch, perhaps some future work, or more likely some incoherent
> > > illogical thoughts :
> > >
> > > (1) It seems like not making the folio uptodate will cause shmem faults
> > > to mark the swap entry as hwpoisoned, but I don't see similar handling
> > > for do_swap_page(). So it seems like even if we SIGBUS the process,
> > > other processes mapping the same page could follow in the same
> > > footsteps.
> >
> > It's analogous to what __end_swap_bio_read() does for block backends,
> > so it's hitchhiking on the standard swap protocol for read failures.
>
> Right, that's also how I got the idea when I did the same for large
> folios handling.

And your handling of the large folio (along with the comment in the
other thread) was how I got the idea for this patch :)

>
> >
> > The page sticks around if there are other users. It can get reclaimed,
> > but since it's not marked dirty, it won't get overwritten. Another
> > access will either find it in the swapcache and die on !uptodate; if
> > it was reclaimed, it will attempt another decompression. If all
> > references have been killed, zswap_invalidate() will finally drop it.
> >
> > Swapoff actually poisons the page table as well (unuse_pte).
>
> Right. My question was basically why don't we also poison the page table
> in do_swap_page() in this case. It's like that we never swapoff.

That would require a rmap walk right? To also poison the other PTEs
that point to the faulty (z)swap entry?

Or am I misunderstanding your point :)

>
> This will cause subsequent fault attempts to return VM_FAULT_HWPOISON
> quickly without doing through the swapcache or decompression. Probably
> not a big deal, but shmem does it so maybe it'd be nice to do it for
> consistency.
>
> >
> > > (2) A hwpoisoned swap entry results in VM_FAULT_SIGBUS in some cases
> > > (e.g. shmem_fault() -> shmem_get_folio_gfp() -> shmem_swapin_folio()),
> > > even though we have VM_FAULT_HWPOISON. This patch falls under this
> > > bucket, but unfortunately we cannot tell for sure if it's a hwpoision or
> > > a decompression bug.
> >
> > Are you sure? Actual memory failure should replace the ptes of a
> > mapped shmem page with TTU_HWPOISON, which turns them into special
> > swap entries that trigger VM_FAULT_HWPOISON in do_swap_page().
>
> I was looking at the shmem_fault() path. It seems like for this path we
> end up with VM_SIGBUS because shmem_swapin_folio() returns -EIO and not
> -EHWPOISON. This seems like something that can be easily fixed though,
> unless -EHWPOISON is not always correct for a diffrent reason.
>
> >
> > Anon swap distinguishes as long as the swapfile is there. Swapoff
> > installs poison markers, which are then handled the same in future
> > faults (VM_FAULT_HWPOISON):
> >
> > /*
> >  * "Poisoned" here is meant in the very general sense of "future accesses are
> >  * invalid", instead of referring very specifically to hardware memory errors.
> >  * This marker is meant to represent any of various different causes of this.
> >  *
> >  * Note that, when encountered by the faulting logic, PTEs with this marker will
> >  * result in VM_FAULT_HWPOISON and thus regardless trigger hardware memory error
> >  * logic.
>
> If that's the case, maybe it's better for zswap in the future if we stop
> relying on not marking the folio uptodate, and instead propagate an
> error through swap_read_folio() to the callers to make sure we always
> return VM_FAULT_HWPOISON and install poison markers.
>
> The handling is a bit quirky and inconsistent, but it ultimately results
> in VM_SIGBUS or VM_FAULT_HWPOISON which I guess is fine for now.

Yeah I think it's OK for now. FWIW it's consistent with the way we
treat swap IO error, as you pointed out :)

>
> >  */
> > #define  PTE_MARKER_POISONED                    BIT(1)

next prev parent reply	other threads:[~2025-02-26 23:20 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-02-25 21:32 Nhat Pham
2025-02-25 22:25 ` Andrew Morton
2025-02-25 22:28   ` Nhat Pham
2025-02-26  0:51 ` Johannes Weiner
2025-02-26  2:45   ` Yosry Ahmed
2025-02-26  3:57     ` Johannes Weiner
2025-02-26 15:42       ` Yosry Ahmed
2025-02-26 23:39   ` Nhat Pham
2025-02-26  2:40 ` Yosry Ahmed
2025-02-26 23:16   ` Nhat Pham
2025-02-27  0:03     ` Yosry Ahmed
2025-02-26  3:12 ` Yosry Ahmed
2025-02-26  4:57   ` Johannes Weiner
2025-02-26 15:33     ` Yosry Ahmed
2025-02-26 23:20       ` Nhat Pham [this message]
2025-02-27  0:00         ` Yosry Ahmed
2025-02-26 23:29   ` Nhat Pham
2025-02-26 23:58     ` Yosry Ahmed

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAKEwX=PR3tJM4X00hSua-w-FNR_ZwQ1oRqdT2Cgj_FV9cCUing@mail.gmail.com' \
    --to=nphamcs@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=chengming.zhou@linux.dev \
    --cc=hannes@cmpxchg.org \
    --cc=kernel-team@meta.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=yosry.ahmed@linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox