From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id E4F54C021B8 for ; Wed, 26 Feb 2025 23:16:54 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 7E9CF280002; Wed, 26 Feb 2025 18:16:54 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 799F4280001; Wed, 26 Feb 2025 18:16:54 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 661B2280002; Wed, 26 Feb 2025 18:16:54 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 497B5280001 for ; Wed, 26 Feb 2025 18:16:54 -0500 (EST) Received: from smtpin28.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id B111AB3ECD for ; Wed, 26 Feb 2025 23:16:53 +0000 (UTC) X-FDA: 83163657906.28.BBBFC0D Received: from mail-qv1-f51.google.com (mail-qv1-f51.google.com [209.85.219.51]) by imf02.hostedemail.com (Postfix) with ESMTP id D137C80002 for ; Wed, 26 Feb 2025 23:16:51 +0000 (UTC) Authentication-Results: imf02.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=GG0lEAtC; spf=pass (imf02.hostedemail.com: domain of nphamcs@gmail.com designates 209.85.219.51 as permitted sender) smtp.mailfrom=nphamcs@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1740611811; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=0+wupZjcEefs4tpOMXq9CcK49xcNos1HQVedAtnerkA=; b=hAMxaC2JlS8s4pMrVtad2BWpv/FUoR81LQRpbwYEkyEnt/wRwF8KBrjp1Qpstx0WYWrvIW gYc7i3gmdXbjfPrsdRjS0VFMkMJMZY86d//iswKnx4Fi1eApgzbQHsdbo+5/57rlkMXZfm G7KUUM18e9bWDPI2wejAu1Y+L4FGMb4= ARC-Authentication-Results: i=1; imf02.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=GG0lEAtC; spf=pass (imf02.hostedemail.com: domain of nphamcs@gmail.com designates 209.85.219.51 as permitted sender) smtp.mailfrom=nphamcs@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1740611811; a=rsa-sha256; cv=none; b=dcVEisgOhqohrJxnL0bq4Xk8IXuTWenOFpORYGQDQsk8TbLUxBP+NoxYTXmx0xpSXYKtAl gLYnXfttzyi2zYmtpetj+yV/UYE0+wP3RjK0UNPRnIALxZO8f/Reicpri+0TjdMs6C+hC+ xJWgGuEVXSI3dY41bZt2hn4LaBVKQk4= Received: by mail-qv1-f51.google.com with SMTP id 6a1803df08f44-6e672a21b9dso3868346d6.1 for ; Wed, 26 Feb 2025 15:16:51 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1740611811; x=1741216611; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=0+wupZjcEefs4tpOMXq9CcK49xcNos1HQVedAtnerkA=; b=GG0lEAtCg0fkZsDvoFOOX6RjDfBw0UBjxooiytdbeOQzFaA53XCPnR3tv9vWkZ5IYH K8iaPHx5gZ3I9K64Ln8038JzBrRQhP+lZUIcODJOzPUZcv3a7udYrl8ZRW080ek/kes2 vagUskEaeb45ObOcYiCW0n811QJwCXI+xWB4pcg0U8GhPvx42ejaLZJDSA5OTIn2wT9z miv40my0f6zvXQiM7M8P+VDxIFEbNZi222kn6ZdwQ8n+CwYwVRmOkZGSOH/NIgoRMh+O j+w3V93XBVgol7pWk9NVnJIsJqQxru6l1oQumvSZxi9bL1eKEdYdb9IpImzJwfdkYyn3 rUdg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1740611811; x=1741216611; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=0+wupZjcEefs4tpOMXq9CcK49xcNos1HQVedAtnerkA=; b=T4kSyiol1lNaR9tpx5xs1rwa2S/C93RnEx+h0980s9TP5Dy/+dMvAGIooGP+rHMrWZ viFgErugy9iDeBjL6IQPDzEQ00stoZV1nVaLD4rZdXqXghevjN1hgH6MrqFC8wOrBLL7 K+r79hHkBCaQYDDOiS9Txq0Wfid6mpaLUuFONz6HSROPP9Qy+1a9fsvwQiAA/4q/Hwjv QHwoYK88RYBDFZdjz1K5wJUZK+w8QveXUDzU8XUMCnAhT4eSoEg8+46TF+U1whxBDQRW ov1yXvC1CVFF65AZzur3LGxIMnX8w4jLH5qNJkRfuPcXA7P0fkESICj0spF5N4qOlcEw AuKw== X-Forwarded-Encrypted: i=1; AJvYcCUJ3Rj1r04oxnmPy+CHwYUyWjiUe3BYpEWJUlSe35Ig3IO92b11COnSdK3bQ9bgn5dR57ovqrc33w==@kvack.org X-Gm-Message-State: AOJu0Yxd4DnS9dxGf7cYk6YNMiXQBC2PkD4IoU4OFIMlNlhVEhmCHFxu yal/8+ihDSJLh0/pD1fflsu+SeMJX1KqD+jcaY5Xrf6DfDq/+DYS3ub9buMbZLZD66cEDvK0MwH PSbv8AZUdDdyoNYnEpGRBWT62eiI= X-Gm-Gg: ASbGncts3Vguym2H1LVw7KjhoJWwg/42CarZiX4aXzVmxs6ueodbgVahNh9F5HopNFe km0DFMzJGbnwm8e1/SopIAdTqbiQW3BnIlz5AS5L+tklqTaXqus4sWlzyn0NScRIWyG4DMQZwfm PHKY6sWwMVButxF51AAUV4rYA= X-Google-Smtp-Source: AGHT+IGsasYny4HX7AkfjmHKLz/AeaS+A2cp7sxldHFFQq9cWhYyGqvPgHsvOo9UeHod2YDTS9m+WG/XuWusAjOAqnk= X-Received: by 2002:ad4:4eaf:0:b0:6e6:659d:296 with SMTP id 6a1803df08f44-6e886879664mr71188866d6.5.1740611810949; Wed, 26 Feb 2025 15:16:50 -0800 (PST) MIME-Version: 1.0 References: <20250225213200.729056-1-nphamcs@gmail.com> In-Reply-To: From: Nhat Pham Date: Wed, 26 Feb 2025 15:16:39 -0800 X-Gm-Features: AQ5f1JpO6T6yzjZ0ZX6F4L_ilnBQKvaKjWzz5wbWmUYQJk0UFPUeeOURG640_uw Message-ID: Subject: Re: [PATCH] zswap: do not crash the kernel on decompression failure To: Yosry Ahmed Cc: akpm@linux-foundation.org, hannes@cmpxchg.org, chengming.zhou@linux.dev, linux-mm@kvack.org, kernel-team@meta.com, linux-kernel@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspam-User: X-Rspamd-Server: rspam11 X-Rspamd-Queue-Id: D137C80002 X-Stat-Signature: 5x8noe3m9dy7qrwgrjdypk9axd7jf6gj X-HE-Tag: 1740611811-370578 X-HE-Meta: U2FsdGVkX1+xwTKx6l7FIPgG0lCzFxZA9l7wUADEel0EHjWa0joLxslUVPpv5luSNx230w2MltyRRcvf8iauWhja9eCLhqLw2JY+91gCvPtbHCEkgQsH864BVxZoPTkXtVMNBNnJqmHX+BNLi+k3AAArgiWDRjI7a5lOUZ5y+Xf0sKVJWaY8AizE7eIM2A4kySG/wT6vcMQXeU5RegehjqTOS8RavsCgzJcdc0DD+gWaowPXG4x41lSzrX3jVqOkvz77UEsTiAui+gA/anuBpuJk11cz+oOxpAzTiRyDmnq1k0yfgv9kQpxf1V3U49pz1cltM/NVc9IxfA4EgAoAxXZq+ty7aJU40xDK528zGrm/8csWYeh09TbqKtZ8Z+Mv6eD0eZmSTc4x8EhV/g/YNz/8L+OIEBCWGnzGl4E//W+c/zxCINdkzTqnqbOmHYAuqQRW4ELFXPxg6BIBqyhJpEAd5bK5cGi2gcQiRKy07uoikohuDXJy75FPEUcbNkJg7zpLPIaEly9xGgdG6vNnxZTisyRnEL5SDyfLpanjLmRAPPegGTiuGJ2nYEBSY+1O5oZGI0mgaOfhDqcVe5GTRPQms567d/jSjrGJepfa2fwDeoBhtZhzjehAb0WHYs3bADLohgXOXDCVrcEfw44Q4QWg3uGqCfCO6PSsdVzS1AGW08gjrDj8l04rUosD0PGPnzXWTzJqSzq9vyj8lZ6e2eRHHFqPNJrHbIVBUWtPCRorvFUG2COGcyYYSM6B2gKRafBc0SwsbwPMCUuW+j4agozoBPWn7XyRb1VuC6sSjF+EcvMItSxrvXR91vANvCzC6AdX3J9gxIZZD4hS4CDRJ+jCpNdDClrAVnqia2QhirmgOIcKwq12jkz8Bo8qpThGmRk215tYc9XQWeymf+y/R9V/qPqCF6XBqESPaj94RSPsaBxbQck94MlD+qHJKWQLWuQTPDYLRTDqe6ajtxL SiMCKyao nNrjV7toh8PdWDtUETA2iOoPReN1BJGJdaIFQ/QrzQl3wuWrTqrAi8Jk6Q+fCinaIcEj47ff4H9wOJj0luMKTY2hIfV+jYxE7C9I8bSWgqaYU9kKJ4B2Vr/KgStlyQbvygZqy9ptgopAn0J7XjPA9zj86MVYAgZx69m8z9OAGQCLcTEKJ0ovCdZl7fVpeyc83c6K9wq/CohnkINgOIJRieWFqhW7FpYSPxNjyLhyKL0rY0BwcbJlRz3PWSPenxpQ4jYkyx0Uycn5xHdlHWxwUYStLy6E5j4bMDFaYwqZWT56jl4tRXxUTJMisKv4LsKau/N/387xYlRthicrCnMNWS0oR+AWkKMAH5fvNUa/Ob7vWN4vUfbNmrU3UwJ7+pr899DytVpI9oqPJp0jjmu6wnVH0dA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Tue, Feb 25, 2025 at 6:40=E2=80=AFPM Yosry Ahmed = wrote: > > On Tue, Feb 25, 2025 at 01:32:00PM -0800, Nhat Pham wrote: > > Currently, we crash the kernel when a decompression failure occurs in > > zswap (either because of memory corruption, or a bug in the compression > > algorithm). This is overkill. We should only SIGBUS the unfortunate > > process asking for the zswap entry on zswap load, and skip the corrupte= d > > entry in zswap writeback. > > It's probably worth adding more details here. For example, how the > SIGBUS is delivered (it's not super clear in the code), and the > distinction between handling decompression failures for loads and > writeback. Will do :) > > > > > See [1] for a recent upstream discussion about this. > > > > [1]: https://lore.kernel.org/all/ZsiLElTykamcYZ6J@casper.infradead.org/ > > > > Suggested-by: Matthew Wilcox > > Suggested-by: Yosry Ahmed > > It's Yosry Ahmed now :P Oops, will fix :P > > > Signed-off-by: Nhat Pham > > --- > > mm/zswap.c | 85 +++++++++++++++++++++++++++++++++++++----------------- > > 1 file changed, 58 insertions(+), 27 deletions(-) > > > > diff --git a/mm/zswap.c b/mm/zswap.c > > index f6316b66fb23..31d4397eed61 100644 > > --- a/mm/zswap.c > > +++ b/mm/zswap.c > > @@ -62,6 +62,8 @@ static u64 zswap_reject_reclaim_fail; > > static u64 zswap_reject_compress_fail; > > /* Compressed page was too big for the allocator to (optimally) store = */ > > static u64 zswap_reject_compress_poor; > > +/* Load and writeback failed due to decompression failure */ > > Nit: Load or writeback? Consider it done :) > > > +static u64 zswap_reject_decompress_fail; > > /* Store failed because underlying allocator could not get memory */ > > static u64 zswap_reject_alloc_fail; > > /* Store failed because the entry metadata could not be allocated (rar= e) */ > > @@ -953,11 +955,12 @@ static bool zswap_compress(struct page *page, str= uct zswap_entry *entry, > > return comp_ret =3D=3D 0 && alloc_ret =3D=3D 0; > > } > > > > -static void zswap_decompress(struct zswap_entry *entry, struct folio *= folio) > > +static bool zswap_decompress(struct zswap_entry *entry, struct folio *= folio) > > { > > struct zpool *zpool =3D entry->pool->zpool; > > struct scatterlist input, output; > > struct crypto_acomp_ctx *acomp_ctx; > > + bool ret =3D true; > > u8 *src; > > > > acomp_ctx =3D raw_cpu_ptr(entry->pool->acomp_ctx); > > @@ -984,12 +987,19 @@ static void zswap_decompress(struct zswap_entry *= entry, struct folio *folio) > > sg_init_table(&output, 1); > > sg_set_folio(&output, folio, PAGE_SIZE, 0); > > acomp_request_set_params(acomp_ctx->req, &input, &output, entry->= length, PAGE_SIZE); > > - BUG_ON(crypto_wait_req(crypto_acomp_decompress(acomp_ctx->req), &= acomp_ctx->wait)); > > - BUG_ON(acomp_ctx->req->dlen !=3D PAGE_SIZE); > > + if (crypto_wait_req(crypto_acomp_decompress(acomp_ctx->req), &aco= mp_ctx->wait) || > > + acomp_ctx->req->dlen !=3D PAGE_SIZE) { > > + ret =3D false; > > + zswap_reject_decompress_fail++; > > + pr_alert_ratelimited( > > + "decompression failed on zswap entry with offset = %08lx\n", > > + entry->swpentry.val); > > + } > > This can probably be prettier if we save the return value of > crypto_wait_req() in a variable, and then do the check at the end of the > function: > > zswap_decompress() > { > mutex_lock(); > ... > ret =3D crypto_wait_req(..); > ... > mutex_unlock(); > ... > if (ret || acomp_ctx->req->dlen !=3D PAGE_SIZE) { Hmmmm, do we need the mutex to protect acomp_ctx->req->dlen? No strong opinions here from my end TBH. > /* SHIT */ > return false; > } > return true; > } [...] > > We are doing load + erase here and in the writeback path now, so two > xarray walks instead of one. How does this affect performance? We had a > similar about the possiblity of doing a lockless xas_load() followed by > a conditional xas_erase() for zswap_invalidate(): > > https://lore.kernel.org/lkml/20241018192525.95862-1-ryncsn@gmail.com/ > > Unfortunately it seems like we can't trivially do that unless we keep > the tree locked, which we probably don't want to do throughout > decompression. > > How crazy would it be to remove the entry from the tree and re-add it if > compression fails? Does swapcache_prepare() provide sufficient > protection for us to do this without anyone else looking at this entry > (seems like it)? My concern is, what happens if xa_store() in the re-add path fails because we do not have enough memory? > > Anyway, this is all moot if the second walk is not noticeable from a > perf perspective.