From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id E00A9CDB474 for ; Tue, 17 Oct 2023 19:04:51 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 625EA8005D; Tue, 17 Oct 2023 15:04:51 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 5AEDD80009; Tue, 17 Oct 2023 15:04:51 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 4290D8005D; Tue, 17 Oct 2023 15:04:51 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 2C25C80009 for ; Tue, 17 Oct 2023 15:04:51 -0400 (EDT) Received: from smtpin05.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id EE37C160D80 for ; Tue, 17 Oct 2023 19:04:50 +0000 (UTC) X-FDA: 81355880340.05.EE21489 Received: from mail-io1-f46.google.com (mail-io1-f46.google.com [209.85.166.46]) by imf09.hostedemail.com (Postfix) with ESMTP id 31E1E14000D for ; Tue, 17 Oct 2023 19:04:49 +0000 (UTC) Authentication-Results: imf09.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=ZOALdmaA; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf09.hostedemail.com: domain of nphamcs@gmail.com designates 209.85.166.46 as permitted sender) smtp.mailfrom=nphamcs@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1697569489; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=XjdScQE3k6FbfKXgTzIU4GGDXOuCuyF0SItNvgyGISI=; b=SmgL1l+dW/3kdwmAqtfgvvqq8WAzjRi6pGG31sMi6cw+r0A3aHIKcNsDNMH49IQflrp3WT ySdCImGLpdBH1JzziUj7vqX7TCQQIJtxj0lEXhk91B5srKINfEnYeMP+1NROkKv+eWPeHL ku9dwJaUOgXIrM8i+OZURYnYn+Hhh+g= ARC-Authentication-Results: i=1; imf09.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=ZOALdmaA; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf09.hostedemail.com: domain of nphamcs@gmail.com designates 209.85.166.46 as permitted sender) smtp.mailfrom=nphamcs@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1697569489; a=rsa-sha256; cv=none; b=a39X/udFNWBpKyV2C0qD1eFa8FheEKUqZ4QXBaYKfkjmsUO4jOIOFNH+hNxoGn83+FZM6l 0ZZIAAmxw2DVOCJTXxYFN4AIfTDLINBkCBdSJGdMoHqcl89eNfP4hTFsUwoTKNb6t6wY0q ahw7AiWnwl8thdSN0iVbkJHliPkKqac= Received: by mail-io1-f46.google.com with SMTP id ca18e2360f4ac-79fa387fb96so242861239f.1 for ; Tue, 17 Oct 2023 12:04:48 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1697569488; x=1698174288; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=XjdScQE3k6FbfKXgTzIU4GGDXOuCuyF0SItNvgyGISI=; b=ZOALdmaAdKXkWhZ0EfKEE5QNVM0ePKexKxKYg387OlImyqcNoc//FPmdcExoLenEYX IJ4a1+O3UaADh5Gefgo7AqxwNSP1wQxCHTfHb5MW8bTnXtsQzeqhDZuiTFZmBhtGiszK ZUCvq0dRQNjLAOrJJXqmcnVsGk7yszicprE2xruE04Ygw4aEhkKckRnfxFLWzQL9nhgm CzZ3L/3VAVWmKSW0m8fdLHuM4sRh8fhoJ1HhL+QhjAJyMFOzwZ4iFXqAPd0kiYlkxWZE SczHK9b5bCu2vl7zL25016V5YJxdO/ECD0o7fksR5eHtnf1EbuD4mO274MWBjTp83rLe lfCw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1697569488; x=1698174288; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=XjdScQE3k6FbfKXgTzIU4GGDXOuCuyF0SItNvgyGISI=; b=ib7D0CnS/sH5DENptCzpJuWx61ewjqfQKVm63nd8GjcBEO3aC48PAhpOqfrR/wAlF2 tx9hE70mmkrFcDCsXurBEBYZdWdMHGH0B+38mOuslFkoN2Nv+1FM2qMDwREfh+OSA1sR rcu0gQuKpFY/EV7G6BB0JUgCz+KE/76bA2K9i1ChqSzp7yiq3KRb+DzNWo8WrS26e5RB x+xNtitiOCjcjDQXGdIaFmQJLLG54GzM4ZQqtH9toeXA5fmIQU37pKZeW4MsOERf8Kst noU/6lI9Jvtuh0bP3pMeuq+yyqp0NWRAOk+6JB5thqkgjImXq9DqqqL54tRqbQb+XIuh 8UZw== X-Gm-Message-State: AOJu0Ywy7BYsxG2RWc2qLfBJgiisc7CeFsZd/AiVFQkEm/biEU+DVgz+ cqSvs4DPO5RKeqhunsc3H/N4D9V/5DugJz3AfOo= X-Google-Smtp-Source: AGHT+IF4xs2e95iRO9jLPBFf/6nJyjuMoUgYnBZ6W+hFWZb6D/9pfhuWOEZn2nMN36Qr6X+g45UyDO6pCt2XHSjApCE= X-Received: by 2002:a05:6602:2c86:b0:787:34d:f1ea with SMTP id i6-20020a0566022c8600b00787034df1eamr4259898iow.8.1697569488326; Tue, 17 Oct 2023 12:04:48 -0700 (PDT) MIME-Version: 1.0 References: <20231017003519.1426574-1-nphamcs@gmail.com> In-Reply-To: From: Nhat Pham Date: Tue, 17 Oct 2023 12:04:37 -0700 Message-ID: Subject: Re: [PATCH 0/2] minimize swapping on zswap store failure To: Yosry Ahmed Cc: akpm@linux-foundation.org, hannes@cmpxchg.org, cerasuolodomenico@gmail.com, sjenning@redhat.com, ddstreet@ieee.org, vitaly.wool@konsulko.com, hughd@google.com, corbet@lwn.net, konrad.wilk@oracle.com, senozhatsky@chromium.org, rppt@kernel.org, linux-mm@kvack.org, kernel-team@meta.com, linux-kernel@vger.kernel.org, david@ixit.cz Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: 31E1E14000D X-Rspam-User: X-Rspamd-Server: rspam02 X-Stat-Signature: ombiz1cjxufcq9wob648hc68inotknt8 X-HE-Tag: 1697569489-397058 X-HE-Meta: U2FsdGVkX1+Z7dnUx697riRopkFJUN5/lHyif2GDc4NryLpNtf7gmuE4lQa+buBLVHkmCaRlYE29rHtUE1RYJqiH2xycOcm+7ZHelDHM5OM97iOhWM/scBWuhwSSQ5L53ojITmc40Yq2HUv61cbqD2RkG+Idx0RlLI7nGlMfcM3/YpkWiW6KLzpjrgUYYXjysZ9XJQEicII96XDJgZPaiKvr1vHJBAvvNXyrKsJ+twysZGg1uOh2RfSllCuMUUCo8QDJskgwVfxEFoE8Llly5KftUwEZxUK5ZYRdK4cGx1+XTaT/vH/UxLoucwcOcpG2Wjbzr0YtnOhKQ46NTxlrOI1pY8v4uYuGa4450YnEfdGJae6F4yH1zAeHxOprrb5zWDsYljLwyiZSwzM36zKj2k2CVoi1jTNBOhm0asMXCQ+VA6cm6bEUYCzle3ipGUT8WsxT14ffO35919HAkdTN3ab/rFLxxGJj3PJJs38t04ItbslqzxThGAUp1x1YolXzzXd+6ynsvZhxedIryCljxmuogOXdcAHQXpYLe7nP8A8Cc8uMub2OjUn5hw09bpGGs9hV0kPHidzkKfKbXdtSLMLhtQ4+rrcsINloG1YU9NRUWtjEM5DdPqkwrDJDJvIF/CnCR/o5hIWzguWbPrbK0GtdDLHp5rKPv+q2hulFcHD8Kp1u7rb+wMd6HZzQQ4pNkKvcXc3ciC4t52RCTr4f3tqM9AWwiscnkVTQD1U9fBYNFU93bsLmQI7xpEkhCSRP+PakRmteFBiN+wmd/czeVB9r84OCw6w5qwv+QBXmOl5vYezMkwGvk/lFSEyzp4ehjEFZLIoKvaXw3EjBYipc+dD3PpL3So9LAVTupxNfMvDK/34PE8aUarIvpO/YAJpK5l2FYKZPE+zeJGsaUmVXvQrF6o6A1IQCfmExkeAR93NI1I/59xXD9m900k8jNnHq07BXyjRNmr+4DHHEI4c uBFP85ly mZp8711VM9Qq6SjaVlYAizNWwGy4WPbQRmWPXpVsbw6UTm+qJearpd0qWp5Kh0ZRaWpHvEhyls/IqOuO58Fl2hCIOzgvHcc0CNX/MnzkcCnIK/8hbcxqnqbrjVjh5RCOsTXxBLh6jDHLW+RkxiMvPx5SsLb2CAwrCeYBP1qAQW8JDBMrsEmVNsyOEaS74iO5+N3yxlSf+DQBBcN43lOj9whWvXajGkYWh4ioT0v9NAYOHgqFexGJ1FqLeWuu2hVkLkyvCFnmZm5wj6Sz4iHbETCZsP9MUvdHTN+cHswRmLgAWQRFcOEfs1sPkjjd5f/Y1AzNSLlgtY4F+sjIAT2/zdYRo/u2R9SkStLgAf0JQ/wyCpzCPIMld4yvN6pc8URWbdyDu7QAh2ic2v/5NtttSo94sHA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.047151, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Tue, Oct 17, 2023 at 12:03=E2=80=AFPM Nhat Pham wrot= e: > > On Mon, Oct 16, 2023 at 5:58=E2=80=AFPM Yosry Ahmed wrote: > > > > On Mon, Oct 16, 2023 at 5:35=E2=80=AFPM Nhat Pham w= rote: > > > > > > Currently, when a zswap store attempt fails, the page is immediately > > > swapped out. This could happen for a variety of reasons. For instance= , > > > the compression algorithm could fail (such as when the data is not > > > compressible), or the backend allocator might not be able to find a > > > suitable slot for the compressed page. If these pages are needed > > > later on, users will incur IOs from swapins. > > > > > > This issue prevents the adoption of zswap for potential users who > > > cannot tolerate the latency associated with swapping. In many cases, > > > these IOs are avoidable if we just keep in memory the pages that zswa= p > > > fail to store. > > > > > > This patch series add two new features for zswap that will alleviate > > > the risk of swapping: > > > > > > a) When a store attempt fail, keep the page untouched in memory > > > instead of swapping it out. > > > > What about writeback when the zswap limit is hit? I understand the > > problem, but I am wondering if this is the correct way of fixing it. > > We really need to make zswap work without a backing swapfile, which I > > think is the correct way to fix all these problems. I was working on > > that, but unfortunately I had to pivot to something else before I had > > something that was working. > > > > At Google, we have "ghost" swapfiles that we use just to use zswap > > without a swapfile. They are sparse files, and we have internal kernel > > patches to flag them and never try to actually write to them. > > > > I am not sure how many bandaids we can afford before doing the right > > thing. I understand it's a much larger surgery, perhaps there is a way > > to get a short-term fix that is also a step towards the final state we > > want to reach instead? > > Regarding the writeback - I'll make sure to also short-circuit writeback > when the bypass_swap option is enabled in v2 :) I'll probably send out > v2 after ... I gather all these feedbacks. > I absolutely agree that we must decouple zswap and swap (and would > be happy to help out in any capacity I could - we have heard similar > concerns/complaints about swap wastage from internal parties as well). > > However, as Johannes has pointed out, this feature still has its place, > given our already existing swapfile deployments. I do agree that a > global knob is insufficient tho. I'll add a per-cgroup knob in v2 so that > we can enable/disable this feature on a per-workload basis. > > > > > > > > > b) If the store attempt fails at the compression step, allow the page > > > to be stored in its uncompressed form in the zswap pool. This maintai= ns > > > the LRU ordering of pages, which will be helpful for accurate > > > memory reclaim (zswap writeback in particular). > > > > This is dangerous. Johannes and I discussed this before. This means > > that reclaim can end up allocating more memory instead of freeing. > > Allocations made in the reclaim path are made under the assumption > > that we will eventually free memory. In this case, we won't. In the > > worst case scenario, reclaim can leave the system/memcg in a worse > > state than before it started. > > > > Perhaps there is a way we can do this without allocating a zswap entry? > > > > I thought before about having a special list_head that allows us to > > use the lower bits of the pointers as markers, similar to the xarray. > > The markers can be used to place different objects on the same list. > > We can have a list that is a mixture of struct page and struct > > zswap_entry. I never pursued this idea, and I am sure someone will > > scream at me for suggesting it. Maybe there is a less convoluted way > > to keep the LRU ordering intact without allocating memory on the > > reclaim path. > > Hmm yeah you're right about these concerns. That seems like a lot more > involved than what I envisioned initially. > > Let's put this aside for now. I'll just send the first patch in v2, and w= e can > work on + discuss more about uncompressed pages storing later on. > > > > > > > > > These features could be enabled independently via two new zswap modul= e > > > parameters. > > > > > > Nhat Pham (2): > > > swap: allows swap bypassing on zswap store failure > > > zswap: store uncompressed pages when compression algorithm fails > > > > > > Documentation/admin-guide/mm/zswap.rst | 16 +++++++ > > > include/linux/zswap.h | 9 ++++ > > > mm/page_io.c | 6 +++ > > > mm/shmem.c | 8 +++- > > > mm/zswap.c | 64 +++++++++++++++++++++++-= -- > > > 5 files changed, 95 insertions(+), 8 deletions(-) > > > > > > -- > > > 2.34.1