From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0C1C7C021A4 for ; Wed, 26 Feb 2025 04:57:38 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id E0410280008; Tue, 25 Feb 2025 23:57:37 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id D8C54280007; Tue, 25 Feb 2025 23:57:37 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C062D280008; Tue, 25 Feb 2025 23:57:37 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 9EFA5280007 for ; Tue, 25 Feb 2025 23:57:37 -0500 (EST) Received: from smtpin09.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 0E7CB1C856B for ; Wed, 26 Feb 2025 04:57:37 +0000 (UTC) X-FDA: 83160887754.09.5E1928E Received: from mail-qk1-f182.google.com (mail-qk1-f182.google.com [209.85.222.182]) by imf19.hostedemail.com (Postfix) with ESMTP id AF0011A0002 for ; Wed, 26 Feb 2025 04:57:34 +0000 (UTC) Authentication-Results: imf19.hostedemail.com; dkim=pass header.d=cmpxchg-org.20230601.gappssmtp.com header.s=20230601 header.b=rjBF6V15; dmarc=pass (policy=none) header.from=cmpxchg.org; spf=pass (imf19.hostedemail.com: domain of hannes@cmpxchg.org designates 209.85.222.182 as permitted sender) smtp.mailfrom=hannes@cmpxchg.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1740545855; a=rsa-sha256; cv=none; b=4Pnb5bj4xhsdi2cpaNpeikEMDGSsSV0SPulIXPenV+ZlD28Tg8YJr8dfuip6QHltmEqNVX F9iA3BuD6NsvDUWqpogs/Z++QPcdFu0nKdOZjZE7Ug7rNax9d6ImBdH952HOCuDArwGuYl APAElFyIVV2pFQ5ox9NLbTfoYLSyyjs= ARC-Authentication-Results: i=1; imf19.hostedemail.com; dkim=pass header.d=cmpxchg-org.20230601.gappssmtp.com header.s=20230601 header.b=rjBF6V15; dmarc=pass (policy=none) header.from=cmpxchg.org; spf=pass (imf19.hostedemail.com: domain of hannes@cmpxchg.org designates 209.85.222.182 as permitted sender) smtp.mailfrom=hannes@cmpxchg.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1740545855; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=Rx2vdR43aGu8aw7BzqWEnWG1tJkA9ALfWFU1ddi5U7c=; b=sDNMrNB2e9Yc5bKJnEUyKXjjShqoWvQ6NSXpWxmIKHcg/xRq+svVdpiESLl/3VpE6FhUPA p4W7FAC57CNTBbzQT8hXjKkMJN6Lgbgejds1GodrwUNWDbFlEWmpf7DOIT3NQHIRY7skdU e+E2le0TXPq+57YkiaEy0Hlx9f9Ouxs= Received: by mail-qk1-f182.google.com with SMTP id af79cd13be357-7c0ba89dda9so688792985a.0 for ; Tue, 25 Feb 2025 20:57:34 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cmpxchg-org.20230601.gappssmtp.com; s=20230601; t=1740545853; x=1741150653; darn=kvack.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=Rx2vdR43aGu8aw7BzqWEnWG1tJkA9ALfWFU1ddi5U7c=; b=rjBF6V15QamYuZxqDZgIx7wKAPAMnx0Q6mFhvWSIyqbRAODK/NSPrcJHFPccGN52x2 /Y8NVywm7TZlh+hNxElgRjypnCm/1Pc0/VfuKnjL9HIUiA1JEGhA9dcjsabdhekCpvkJ iwkKVxOfRA1Ny6jJliw0+i26QBTBtUh0qpnJvcFPdnjtPDUla1D23erHDj578tGPPIPZ eAocPmZ7COqL7ufkiWB29DDpl45k59iJtedByXwgTjdTpup7gmrg3WIFpcc/L0Ue42MU CLmaMcbcc9vYcjDpzJO2Ea4GMf65Z7LmWHtaVZyqTkNQm5gDmmg/XcSCK46jw+fj+Nn1 O04g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1740545853; x=1741150653; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=Rx2vdR43aGu8aw7BzqWEnWG1tJkA9ALfWFU1ddi5U7c=; b=nYmSjyl28N0FR5Del0Md9BWxWxKW2GhNPSJZsOvu2GkUfIg3v1VOIiRs4lRBiYdnQD kY+3+/CGJVytBeZ/VNV6bKd5yLjV2Inh2JdluyokevzixObBlV7aZFhdmMs36333s7BD Kp+qHCWWb83dtes5Qm4OUPIQR89cPqwyzbsylfToc+R6FANt9YuLXv/BWkXVZYXvFW4P tbsRXm41eIBK49M1kTW/G92qZlZQt8rYdX+8Va4wui+gyEWOdpV25nPkJR+XjNiAJNDq 6ZKzZn4Btp4koRtVuQWH9iqvQiYNqCdvrd7EfZtn5xmoFh0H/SI044FjTdD5h1u28Uci LJVA== X-Forwarded-Encrypted: i=1; AJvYcCVZAU3F97LmJKHJwmYBT2uW6cRfC78FyhWtOwo4ZvlcsqsODvKeiDIfruiRaxWeC4b0UDKoJhcEnA==@kvack.org X-Gm-Message-State: AOJu0YwvfSk3iWpmmbyLA1EP+JuFbHgW2iTYMO00exIZaXGP5TGZ7MnC 4kdYcUYwH/KIZ+UEMjNazoIqCqbBerOFHDUfWxjITz1IheRTdqXzgqwauDsdpMY= X-Gm-Gg: ASbGncv9Pfk9FW5C3BertPTcEMO9mOAjwHbkcUkJzKJ0JJ/EKmyZEKVkrDXIHlAv9Rn +RaPuSdqvdnKlXPb83yfY8GQhkp2JP+/Tze35QzXd6rULCBj0c10CCcoY5Jb5OFkeb2gPtyGBTe 9/B6ZoT8KOtSWpFgBM5sbCypYQMliV7ONPuIUid812Uk9/kqTZ2CfPNF4dSLdRlXIu1WnJkrty5 qQSuqlUru7ANqA/x1eT30je4vJshS9hSpzED+yybVlnvphEGTKxkeekq8/u8XbkY+M2+VTRZg8d kdH531ZZDUqCnMEK/wCunOlz X-Google-Smtp-Source: AGHT+IHXpQr2tIK8axL1NoO+YvHxXBFNOZWB+wrlACns9MHvBxqxlaVoPTBvxRIK5eVJNczA7wRl1A== X-Received: by 2002:a05:620a:1916:b0:7c0:c0f5:29ab with SMTP id af79cd13be357-7c23c049b02mr942050085a.55.1740545853621; Tue, 25 Feb 2025 20:57:33 -0800 (PST) Received: from localhost ([2603:7000:c01:2716:da5e:d3ff:fee7:26e7]) by smtp.gmail.com with UTF8SMTPSA id af79cd13be357-7c23c298f08sm200157185a.5.2025.02.25.20.57.31 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 25 Feb 2025 20:57:32 -0800 (PST) Date: Tue, 25 Feb 2025 23:57:27 -0500 From: Johannes Weiner To: Yosry Ahmed Cc: Nhat Pham , akpm@linux-foundation.org, chengming.zhou@linux.dev, linux-mm@kvack.org, kernel-team@meta.com, linux-kernel@vger.kernel.org Subject: Re: [PATCH] zswap: do not crash the kernel on decompression failure Message-ID: <20250226045727.GB1775487@cmpxchg.org> References: <20250225213200.729056-1-nphamcs@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Rspam-User: X-Rspamd-Queue-Id: AF0011A0002 X-Rspamd-Server: rspam12 X-Stat-Signature: rpcaotos5habrynzd4doaie67if3yr66 X-HE-Tag: 1740545854-28623 X-HE-Meta: U2FsdGVkX19pIAWDW6OqlIMZjg8bd3bZSo1V5fHFuJvTnNzLckr2+DPky1zqFWCDbsVVqumsECUxyElgMwmeG92lYUpSJq/9lIBfyf6AKq622X+8Se+S0nOBkVcF2GtL29Ixka4JID9yWz/Y7KjihlU828M2TIuc7cMLPUncASmYvVru6uqCbnak1FtLvSDTZgvMOcOXCbEDGomSsj10yxj3ApMeyEuQ4bAmO2uL9f5v0iGGnE2Rlg4RR/HSwC466mNjurn1dcnkaUx4aOuDug0hsppWq4tBLdvy+l5zWIKKpTRT3TREUz+jfLXZNyV3svgH6Xb+GbsFvseHD2lUNNx6x+smUfhk2X/Z8DKBa6vxcVQ/sG3gU/P5UehgdFlmn5IU0nK9LMHYBXkM9fPm2b5UE1Gh1ZNBbL7dJ4MoJg4no6E9Df5tDrzwowoEAGdszkpona6ljMllS5/G+miXdqPMEtiSzqdzDhwec7FDfzp0zRzZIyOb9ztr8+eCq+xeEbG3dswZO6ynAexLEGNoaR2syUPg3FE1Qt4FxdsXbzBtb0784lr0hAeok3l8A92/f86GrKc9a5og24ySiysLOxRWbWUjb81QK9LKR58fsxm5yLX/NpT62KFG+tYbER038lWZBRo/odK+LSb0knKWZWh8W5k97ofXfxmgSqtAiVnO+b17tsKGYOGOLlxIlMb/0LR/CeQeOpLtgTGjqhgxTCWFBkZAR04LBt9RuUuyKzV4QtUM87cXbHQZ7XIZWGhKPN1VSQnSaPDuMUCRPQ2VnXK41KVa+sHEUWFc96O6tGfC42G0Ka/u7nZr2jCyMHSdyrMi7vB6aITyO16/5fRl0Hf+SiaF9GWWrb7yEqWk3ptCJsSsY4OgcV/vD7ApFGCXX3QGl+1yMXTAPEubZH7hOR17QJSu+l23RjVIYEbHxJdzuqm5sw9YXLoJv4w18nZb/VcNaLg4vJ8pqfU/DG8 4QWSa0Wb y6Q/BsRiq93LC8h4dsS3lpqA4e5kvpWaLrwdVlikNORVcVurxFVK0eQg/rJ0RyAa9snVc557BTtgcpxbXoa5EQmaVMcHfXIl+WqkHxHWtes2iDv5cM2bF+eZ0z1QV6A/PT0JiqH3G3bT93ufUq4BgvHy6U4dcYsfwln5OcTdW0xXOPue9+bBG8MULxvQaIvaF5eSGyGKivstG6DnfrX1ADyPWt85FJlhK9ZVtnmpZy/bDWRiln1WTbxjTMf/uGvlMtNuYPf+fvQthBjG7VnH5KcCfC7AZViWhwZ+Rpb3pIybJI6hoAw4A67WytG2t75/7p3QF3/SA0PuGmILEOhk+B2KIOq6DCXPWubZU/7olzxGFkAzX6GzioGdmLkf01KlGk0PL7Ey9hDhAwxwKYOa4+oX6/9Yo9xQbyloFElxx5SAIkWrfdBlN/HcV0PyBXTzjyofeVczD3PNcMimXQPGT8dqP1Q== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000061, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Wed, Feb 26, 2025 at 03:12:35AM +0000, Yosry Ahmed wrote: > On Tue, Feb 25, 2025 at 01:32:00PM -0800, Nhat Pham wrote: > > Currently, we crash the kernel when a decompression failure occurs in > > zswap (either because of memory corruption, or a bug in the compression > > algorithm). This is overkill. We should only SIGBUS the unfortunate > > process asking for the zswap entry on zswap load, and skip the corrupted > > entry in zswap writeback. > > Some relevant observations/questions, but not really actionable for this > patch, perhaps some future work, or more likely some incoherent > illogical thoughts : > > (1) It seems like not making the folio uptodate will cause shmem faults > to mark the swap entry as hwpoisoned, but I don't see similar handling > for do_swap_page(). So it seems like even if we SIGBUS the process, > other processes mapping the same page could follow in the same > footsteps. It's analogous to what __end_swap_bio_read() does for block backends, so it's hitchhiking on the standard swap protocol for read failures. The page sticks around if there are other users. It can get reclaimed, but since it's not marked dirty, it won't get overwritten. Another access will either find it in the swapcache and die on !uptodate; if it was reclaimed, it will attempt another decompression. If all references have been killed, zswap_invalidate() will finally drop it. Swapoff actually poisons the page table as well (unuse_pte). > (2) A hwpoisoned swap entry results in VM_FAULT_SIGBUS in some cases > (e.g. shmem_fault() -> shmem_get_folio_gfp() -> shmem_swapin_folio()), > even though we have VM_FAULT_HWPOISON. This patch falls under this > bucket, but unfortunately we cannot tell for sure if it's a hwpoision or > a decompression bug. Are you sure? Actual memory failure should replace the ptes of a mapped shmem page with TTU_HWPOISON, which turns them into special swap entries that trigger VM_FAULT_HWPOISON in do_swap_page(). Anon swap distinguishes as long as the swapfile is there. Swapoff installs poison markers, which are then handled the same in future faults (VM_FAULT_HWPOISON): /* * "Poisoned" here is meant in the very general sense of "future accesses are * invalid", instead of referring very specifically to hardware memory errors. * This marker is meant to represent any of various different causes of this. * * Note that, when encountered by the faulting logic, PTEs with this marker will * result in VM_FAULT_HWPOISON and thus regardless trigger hardware memory error * logic. */ #define PTE_MARKER_POISONED BIT(1)