From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 02661C021B8 for ; Wed, 26 Feb 2025 15:33:53 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 92795280008; Wed, 26 Feb 2025 10:33:52 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 8D7E7280002; Wed, 26 Feb 2025 10:33:52 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 7CB4F280008; Wed, 26 Feb 2025 10:33:52 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 56B5E280002 for ; Wed, 26 Feb 2025 10:33:52 -0500 (EST) Received: from smtpin11.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 09F62B7DBB for ; Wed, 26 Feb 2025 15:33:52 +0000 (UTC) X-FDA: 83162491104.11.2878D78 Received: from out-180.mta0.migadu.com (out-180.mta0.migadu.com [91.218.175.180]) by imf25.hostedemail.com (Postfix) with ESMTP id 300B2A0027 for ; Wed, 26 Feb 2025 15:33:49 +0000 (UTC) Authentication-Results: imf25.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=AX7MX3wM; spf=pass (imf25.hostedemail.com: domain of yosry.ahmed@linux.dev designates 91.218.175.180 as permitted sender) smtp.mailfrom=yosry.ahmed@linux.dev; dmarc=pass (policy=none) header.from=linux.dev ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1740584030; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=YghUIUtGJFDQ2JIT44YnprUkmQqQzSSO6n1OgaplZRk=; b=lu5OSlo1+XeNjJrzYGO1PVFe7CU50a9JBNsbrJgfvLlOXtgPT38bENhJqd+fflyt4pdBf8 PSrhel6+p+jkikmSch6GJwH11UdQ4UuRPf6pJ5OE5oshnj3eCYqRkJsQKcvqSHtEMy6C00 hA+XSNdFKQ+thIDZ/EApYmQ8gMkdvXY= ARC-Authentication-Results: i=1; imf25.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=AX7MX3wM; spf=pass (imf25.hostedemail.com: domain of yosry.ahmed@linux.dev designates 91.218.175.180 as permitted sender) smtp.mailfrom=yosry.ahmed@linux.dev; dmarc=pass (policy=none) header.from=linux.dev ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1740584030; a=rsa-sha256; cv=none; b=VuchGMN7Y+IIrAQ+bpzvLNj1dDGHX7q/PF+vqBWV/W/XPy1GpGd/+xiKQ19kJpCHUTWJjb e7LQ2ZpTNTRuy0naBW6m8hBRdT+Z+a5Cp8jidXbo4CJmuj+gHlPQIJ0+WUGUxLv1EL6Kzu BTtJNzjwR2/BT2JoyKNr9Bx2Nu6hqqM= Date: Wed, 26 Feb 2025 15:33:43 +0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1740584027; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=YghUIUtGJFDQ2JIT44YnprUkmQqQzSSO6n1OgaplZRk=; b=AX7MX3wMWgNLdA8YTvt8dSILGImJRuHBwZfP9SPk6Nd9B/IDdvnJogK8ehjKudK4AGLCH/ T4oOprYE0r6ihvzf6xnEmIstsYkgnvP2L0maEprSSu4xkQK5pvsUwcIoP2U6awEyoyHMse zjCIDxX/rtym74xSdEBw1a5mqf/+fYg= X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. From: Yosry Ahmed To: Johannes Weiner Cc: Nhat Pham , akpm@linux-foundation.org, chengming.zhou@linux.dev, linux-mm@kvack.org, kernel-team@meta.com, linux-kernel@vger.kernel.org Subject: Re: [PATCH] zswap: do not crash the kernel on decompression failure Message-ID: References: <20250225213200.729056-1-nphamcs@gmail.com> <20250226045727.GB1775487@cmpxchg.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20250226045727.GB1775487@cmpxchg.org> X-Migadu-Flow: FLOW_OUT X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: 300B2A0027 X-Stat-Signature: 4qwpx6sszgfprizbuozh8bf49f81m9h5 X-Rspam-User: X-HE-Tag: 1740584029-432321 X-HE-Meta: U2FsdGVkX18Kr+lm7i4ZMpw6Io9ZNR232/lELF6MLXlcpLyuUBMMWsEYBenGhvFJ6qIJlDjwIE/QkwLzZZ59ms2Ozxn4uR/RIANxp4amTcXYciHRrGGuwyJGsiSqWo5y84o/zCSqDTyVsFpqg9qBA5zp9mME3DV/0vxOFn8B/dGCKvsFtco4lh1vGUUVxefBcUrLpC8DBXiMgko+Qx1EURXhAe5p/g52hiL9d140gRXKvAJddTb4Mq8MwrAJjhm83d1COza88/8z2mfZ1F3b4owXiS7dC5J6Y2CfPot4YtTXQkc542w3x8gogpY5rtcA+5r/ypglCC+m0dXLmZncbJnor3Pk3gwaGIaoPl3enDyY4XTAIdL+DhzaW4lQewRsy9YvBBEVYepMcbKU6GMQnHWr1y55ICaUPPzL3ee6v5G6TiE9dZZdOHRkVZ8DjySoEHN5UIk9VTg7YXEaY7lz/weZiWGtJVI8So2kyRPrl7fju2zcsYGzYmg5TxdWbQmW/MBJMyLYHbc9PQ8dDpFMZ6RO/Va3+vW7jlViNWb96apDnhsDX5WYeYvYVlVGf/xnYeLp4yvaZVTg74j//mEiILi7BxDqfGhJugmsciQWP1H7J+0TeRMGQGJdQClzMlwBmngn49xif0Lk714K6MIKI0UgxkNAH1cRcWaf/7xISnEoBdYj8sp/0AQ+PXxwEqn49sx65UEikNlkfTFFi0mVSApmSei52uemnOwWEMsgRWojyQUMTEHeuDhjScWoVuHcuK9G3w/GSase6jwhzBZI2QKjibwYMQkLW9Dtwhaw0iMfhBSzJaQr7uaegrO8Hxq+1NYRMvdNWlHWRS5/8y9kxAPBK/cEKGD7z2PyiWtL3zVE7qchWbz78CxGcwm6xswV+wWtlPNxfW+XD4FHboNxsHzDvAx1+Li7J5XRZqqbnpaUV+wpBh8tMF7HnqJTXXE4uU5GABjdjBDpWl5kRup o4xK8hop B0h7zfMyfcArJDnsMuTL/BsogS6N9LFvWbaKEgZdsqIqVeLWqIm33cv09SUt7waQvASCf/FgmyvforTT3J1V/fgs81Wi3hY/10EUEojj6Bm/OdFEuPMnOmB8NwEtGt462uIUUFE7TSLoRl/8lMpmFXEbUXbx7w9tqSnV/Td02gLP29vP3bPAfzreI2Ajqis9P+fZv3IyGOTV4RF9P/oeSP6wM9g== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Tue, Feb 25, 2025 at 11:57:27PM -0500, Johannes Weiner wrote: > On Wed, Feb 26, 2025 at 03:12:35AM +0000, Yosry Ahmed wrote: > > On Tue, Feb 25, 2025 at 01:32:00PM -0800, Nhat Pham wrote: > > > Currently, we crash the kernel when a decompression failure occurs in > > > zswap (either because of memory corruption, or a bug in the compression > > > algorithm). This is overkill. We should only SIGBUS the unfortunate > > > process asking for the zswap entry on zswap load, and skip the corrupted > > > entry in zswap writeback. > > > > Some relevant observations/questions, but not really actionable for this > > patch, perhaps some future work, or more likely some incoherent > > illogical thoughts : > > > > (1) It seems like not making the folio uptodate will cause shmem faults > > to mark the swap entry as hwpoisoned, but I don't see similar handling > > for do_swap_page(). So it seems like even if we SIGBUS the process, > > other processes mapping the same page could follow in the same > > footsteps. > > It's analogous to what __end_swap_bio_read() does for block backends, > so it's hitchhiking on the standard swap protocol for read failures. Right, that's also how I got the idea when I did the same for large folios handling. > > The page sticks around if there are other users. It can get reclaimed, > but since it's not marked dirty, it won't get overwritten. Another > access will either find it in the swapcache and die on !uptodate; if > it was reclaimed, it will attempt another decompression. If all > references have been killed, zswap_invalidate() will finally drop it. > > Swapoff actually poisons the page table as well (unuse_pte). Right. My question was basically why don't we also poison the page table in do_swap_page() in this case. It's like that we never swapoff. This will cause subsequent fault attempts to return VM_FAULT_HWPOISON quickly without doing through the swapcache or decompression. Probably not a big deal, but shmem does it so maybe it'd be nice to do it for consistency. > > > (2) A hwpoisoned swap entry results in VM_FAULT_SIGBUS in some cases > > (e.g. shmem_fault() -> shmem_get_folio_gfp() -> shmem_swapin_folio()), > > even though we have VM_FAULT_HWPOISON. This patch falls under this > > bucket, but unfortunately we cannot tell for sure if it's a hwpoision or > > a decompression bug. > > Are you sure? Actual memory failure should replace the ptes of a > mapped shmem page with TTU_HWPOISON, which turns them into special > swap entries that trigger VM_FAULT_HWPOISON in do_swap_page(). I was looking at the shmem_fault() path. It seems like for this path we end up with VM_SIGBUS because shmem_swapin_folio() returns -EIO and not -EHWPOISON. This seems like something that can be easily fixed though, unless -EHWPOISON is not always correct for a diffrent reason. > > Anon swap distinguishes as long as the swapfile is there. Swapoff > installs poison markers, which are then handled the same in future > faults (VM_FAULT_HWPOISON): > > /* > * "Poisoned" here is meant in the very general sense of "future accesses are > * invalid", instead of referring very specifically to hardware memory errors. > * This marker is meant to represent any of various different causes of this. > * > * Note that, when encountered by the faulting logic, PTEs with this marker will > * result in VM_FAULT_HWPOISON and thus regardless trigger hardware memory error > * logic. If that's the case, maybe it's better for zswap in the future if we stop relying on not marking the folio uptodate, and instead propagate an error through swap_read_folio() to the callers to make sure we always return VM_FAULT_HWPOISON and install poison markers. The handling is a bit quirky and inconsistent, but it ultimately results in VM_SIGBUS or VM_FAULT_HWPOISON which I guess is fine for now. > */ > #define PTE_MARKER_POISONED BIT(1)