From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 01ADFC282D1 for ; Thu, 6 Mar 2025 21:32:44 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 96B3D28000B; Thu, 6 Mar 2025 16:32:43 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 8F41228000A; Thu, 6 Mar 2025 16:32:43 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 7956828000B; Thu, 6 Mar 2025 16:32:43 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 58AC228000A for ; Thu, 6 Mar 2025 16:32:43 -0500 (EST) Received: from smtpin16.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id AF2C514040F for ; Thu, 6 Mar 2025 21:32:43 +0000 (UTC) X-FDA: 83192425806.16.0997589 Received: from out-184.mta1.migadu.com (out-184.mta1.migadu.com [95.215.58.184]) by imf06.hostedemail.com (Postfix) with ESMTP id C1FEB180002 for ; Thu, 6 Mar 2025 21:32:41 +0000 (UTC) Authentication-Results: imf06.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=dnXNe3gT; spf=pass (imf06.hostedemail.com: domain of yosry.ahmed@linux.dev designates 95.215.58.184 as permitted sender) smtp.mailfrom=yosry.ahmed@linux.dev; dmarc=pass (policy=none) header.from=linux.dev ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1741296762; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=F/7x1oVZ9Su1f8eDZ7E7mAnws0/aWydnFGOtXeW9HjM=; b=iAYL/V0o5nR6+RmoV4HvDrfKFGYo6Ogu7jUrvg2qPZMS0/iK2wATfXmdvN1NBksiT2ptQQ 22gbflPWo6yOmP/KoRu8ONOXqzxdrhkB/xsV95dGqfkFEgGKK/JmdCJsEUZM924kTwEpVJ ukz+c6dnNRhSzPE8pdmTp5bZwgs6cAo= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1741296762; a=rsa-sha256; cv=none; b=r56SdCKNgPXU5HKLsZ5wba1pW+btKsuXaERpP+DuppLNu/og5LJqty5KCJ7y5sSQP5bZe0 MJFH0Z8w74YMLtnsTz+SEKDj87r0s2fsWsRSlevF840losBguRvn4GMS67WB1RjU7ZxI9L 7vY0Yh+UGFo463I1NR1ljkx7iWtNQN4= ARC-Authentication-Results: i=1; imf06.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=dnXNe3gT; spf=pass (imf06.hostedemail.com: domain of yosry.ahmed@linux.dev designates 95.215.58.184 as permitted sender) smtp.mailfrom=yosry.ahmed@linux.dev; dmarc=pass (policy=none) header.from=linux.dev Date: Thu, 6 Mar 2025 21:32:29 +0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1741296760; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=F/7x1oVZ9Su1f8eDZ7E7mAnws0/aWydnFGOtXeW9HjM=; b=dnXNe3gTZartsLN4Z9r+R1hCWfqnEsyeErHQB+UFtyVbFQxr/lzOkHl/V+Vd1Dlr98Yn48 qo4sSd/XZL2W81K/WbYQWNgCDFm4zJSrg5HrOnMAR0oGMyJVtLJTl+8gJABnux4hHHGdbM 08UDJOENHpAvM07wU1AJim+jwvHge3I= X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. From: Yosry Ahmed To: Nhat Pham Cc: akpm@linux-foundation.org, hannes@cmpxchg.org, chengming.zhou@linux.dev, linux-mm@kvack.org, kernel-team@meta.com, linux-kernel@vger.kernel.org Subject: Re: [PATCH v4] page_io: zswap: do not crash the kernel on decompression failure Message-ID: References: <20250306205011.784787-1-nphamcs@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20250306205011.784787-1-nphamcs@gmail.com> X-Migadu-Flow: FLOW_OUT X-Stat-Signature: 7rf4ehry1ygmers6i6zfa63thiu8bw8i X-Rspamd-Queue-Id: C1FEB180002 X-Rspam-User: X-Rspamd-Server: rspam01 X-HE-Tag: 1741296761-979529 X-HE-Meta: U2FsdGVkX18SLlFpOk1y97b5TqQ72YJDk5j97Spiu33IJ/KpBgCA8+Y1OcUKP3Adb2BP0tG0Gn8netMN14c0mO6BwUoq+2f9WOu4CD8wDfiLZ73O5y+KUJDSgZiZwmw1GoCpZWcYKPjIRnHSl9ouszw0FRAfLTZUAj6XZ5VxEAp6tlsG/05PQqE1O4knKkthYo4yF8mAhI8C4eDMP5IxkXekdhjRH022bKVY8Rm5aQL/B6gzeVOGTLTUjNW2YIuDJC760OlSW6IDF0e1Zpv2k3hr/U/gBFJxMWmAAXex4iLgtyc9ZxhDtKOHx7M0Bcnp07nrzC3BmBwWflCTOXjHqTfDAMMDpCAwAFib67F9yCgnkN8z4c6DEWuA8HkbNGKgXqG1idVFYntqafkuG5pX76HwOCi3+beKNbZxHF8vN/DyTgf2pK0cwwhVmwfde04zdbw4Yh6feaMtisaYmqD/f6itrB5K97PyzmHOhT0gPpjIgeh1/WLZ7q9x6sgDThhYIVYQjKtcA3UyIQZtWWGnhMMNdXBSWe7qOna2Pyq55XZmvlx3PP0rqfSO10yCGS8mo461lJieioHwSDKIj/BCI5jHduIMJLgG1VCyn7ryxbOTRkE0HnB6mI78lfPrqDUFrrFTG0zANt9HCGrUsPUm+DV/EmVzIhyzU6kWjLyL+mnbDlRTGN+GklmPJWGcL+II50OB/bhCFF8ADGgf+spYq2k9POJqvUWOmbb+dp0EKTv5HHhTulk6L0magENXsH8EFYNY1qrCbjaKLlwIWWHfcZFSfAWc/yLFEAzD7pengeBTqSgqb074IVseFkdXAjk5xGPuFDlAFFujsDk5AlDHRUm/K27C6Tx+/l2Yed4Db3gNmcMk+0cfqb5A1dqkUC2W2jE/HXlVdslDC4Re49mnJ0/YoBFeooUlQUtegpFPcwu59oHv1FwvnmBxoNamismVGf6Lke2mx233hYvYPmA 4GUQ7kg7 aq/UHAsUxkE5d9hN3AGoH9811phSrAM/L5yOGD6FRSsvqz+veT9yjD9FnwqNq6ZmhxQZgd93Bxq7P4ybl17r6aJDUxadkKOTu3y1HZ/kNcQ1FG3iP8fXe1c7AjXNmZMZbvhTQWktK2W/cIZ+e6jp4JSyJf2i6l1ZxHoPWaTer7IK5fHRF9PiPAtl6TlmKFGPBIMR6jAni4/z6vIhK9Q3vmrCNtOx04k4wQMXmNTqyjCaRXl5mJ80FC1RM5cHF+I/BdKxAb26yqGpfmE3Ctf1SopeNSMcFn1jHBmQmlvrYNmNe/IoK0ftEjDYjDL3BMJRYNFhI+wZOZErey5ExPUT4mWx0X9pdmIixq9dIK2+C3WIi+g8dKINK5t1SUYg4+eO2TS83FtiGEntNwVas8eMn/Y0k1nbS8AeYtwrxhotzeC5CMSs= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Thu, Mar 06, 2025 at 12:50:10PM -0800, Nhat Pham wrote: > Currently, we crash the kernel when a decompression failure occurs in > zswap (either because of memory corruption, or a bug in the compression > algorithm). This is overkill. We should only SIGBUS the unfortunate > process asking for the zswap entry on zswap load, and skip the corrupted > entry in zswap writeback. > > See [1] for a recent upstream discussion about this. > > The zswap writeback case is relatively straightforward to fix. For the > zswap_load() case, we change the return behavior: > > * Return 0 on success. > * Return -ENOENT (with the folio locked) if zswap does not own the > swapped out content. > * Return -EIO if zswap owns the swapped out content, but encounters a > decompression failure for some reasons. The folio will be unlocked, > but not be marked up-to-date, which will eventually cause the process > requesting the page to SIGBUS (see the handling of not-up-to-date > folio in do_swap_page() in mm/memory.c), without crashing the kernel. > * Return -EINVAL if we encounter a large folio, as large folio should > not be swapped in while zswap is being used. Similar to the -EIO case, > we also unlock the folio but do not mark it as up-to-date to SIGBUS > the faulting process. > > As a side effect, we require one extra zswap tree traversal in the load > and writeback paths. Quick benchmarking on a kernel build test shows no > performance difference: > > With the new scheme: > real: mean: 125.1s, stdev: 0.12s > user: mean: 3265.23s, stdev: 9.62s > sys: mean: 2156.41s, stdev: 13.98s > > The old scheme: > real: mean: 125.78s, stdev: 0.45s > user: mean: 3287.18s, stdev: 5.95s > sys: mean: 2177.08s, stdev: 26.52s > > [1]: https://lore.kernel.org/all/ZsiLElTykamcYZ6J@casper.infradead.org/ > > Suggested-by: Matthew Wilcox > Suggested-by: Yosry Ahmed > Suggested-by: Johannes Weiner > Signed-off-by: Nhat Pham Couple of nits below, but otherwise LGTM: Acked-by: Yosry Ahmed (I did expect the swap zeromap change in the same series, so if you send it separately make sure to mention it's on top of this one because they will conflict otherwise) [..] > @@ -1606,7 +1628,26 @@ bool zswap_store(struct folio *folio) > return ret; > } > > -bool zswap_load(struct folio *folio) > +/** > + * zswap_load() - load a page from zswap nit: folio > + * @folio: folio to load > + * > + * Return: 0 on success, or one of the following error codes: nit: Maybe worth mentioning that the folio is unlocked and marked uptodate on success for completeness. > + * > + * -EIO: if the swapped out content was in zswap, but could not be loaded > + * into the page due to a decompression failure. The folio is unlocked, but > + * NOT marked up-to-date, so that an IO error is emitted (e.g. do_swap_page() > + * will SIGBUS). > + * > + * -EINVAL: if the swapped out content was in zswap, but the page belongs > + * to a large folio, which is not supported by zswap. The folio is unlocked, > + * but NOT marked up-to-date, so that an IO error is emitted (e.g. > + * do_swap_page() will SIGBUS). > + * > + * -ENOENT: if the swapped out content was not in zswap. The folio remains > + * locked on return. > + */ > +int zswap_load(struct folio *folio) [..]