From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 54597CDB474 for ; Tue, 17 Oct 2023 19:03:26 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 8E3448005C; Tue, 17 Oct 2023 15:03:25 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 86C8280009; Tue, 17 Oct 2023 15:03:25 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 6E7658005C; Tue, 17 Oct 2023 15:03:25 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 5814980009 for ; Tue, 17 Oct 2023 15:03:25 -0400 (EDT) Received: from smtpin14.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id AC7B7120E4F for ; Tue, 17 Oct 2023 19:03:24 +0000 (UTC) X-FDA: 81355876728.14.6292F87 Received: from mail-io1-f51.google.com (mail-io1-f51.google.com [209.85.166.51]) by imf21.hostedemail.com (Postfix) with ESMTP id C244D1C001F for ; Tue, 17 Oct 2023 19:03:22 +0000 (UTC) Authentication-Results: imf21.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=k5bcaAVi; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf21.hostedemail.com: domain of nphamcs@gmail.com designates 209.85.166.51 as permitted sender) smtp.mailfrom=nphamcs@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1697569402; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=xEf0Vev2dH/Kch+Ne/BhspaobL0pi1FdS7NjhmirXTQ=; b=HEKNQS8IPN8fX+cjuKDaV59zlHrDf7TW4YM3mHFc7JEoTcy0xatXFx3wwgwvjMcBeTdyit HC59V6oIiAjdTKT3e1C6q5znnpV/eQt2Pa6dZy9KtlYHFgihCPoXzhEuVgf4da/svREIyD 8NrSoGBtWRkoFWn8P3R9pusirfBv08Y= ARC-Authentication-Results: i=1; imf21.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=k5bcaAVi; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf21.hostedemail.com: domain of nphamcs@gmail.com designates 209.85.166.51 as permitted sender) smtp.mailfrom=nphamcs@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1697569402; a=rsa-sha256; cv=none; b=6rlZzmRj8V/tw9Yu61eptEFdGBe5gmKY9ICjKDzLgO6lLaQ4WnLmpy4z3Yp1j1+2/j1zUR zpguFl3kkXMuEf5tkp5q+0SvDV8glom4AL0N0U6jgbHiqRn3SGufEVwTZmpKC1t/O0JOB5 ykWMxUiOVotAbdR0pwaEqtGdEdNsZa8= Received: by mail-io1-f51.google.com with SMTP id ca18e2360f4ac-7a2cc9ee64cso233633539f.1 for ; Tue, 17 Oct 2023 12:03:22 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1697569402; x=1698174202; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=xEf0Vev2dH/Kch+Ne/BhspaobL0pi1FdS7NjhmirXTQ=; b=k5bcaAViLLT8DTZfNFcZkT4aqTGxCbW2bnfeqET7P104y2Wxt7DJ+AgbJ5oEVD47dh id+4Yp7/QobPsVnVEqU8MJXavEN4PDfvzDOWfJ60rNrgnWbWDJNo396H3vNGIXAlK2Zv bajBNaTz+N7QZznJnh2ZbqVBpYGn0+ii4BlshD8WtwZJRVS7RsRK4rALF0sjJ4+/JKUt zTA6hqKEKa0wHzpyqnD6vYhxAoAyGO2fSSHUv1r9Nn6BPf3MrU2NJnsb1x/W+ihXfG18 H2C/lIUZE5h/uI8vCR2wbSapO52Y4RiMrQeMiEQPCFOVZBSM4NFxIbUIRO2QbIV/eMEo jrgw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1697569402; x=1698174202; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=xEf0Vev2dH/Kch+Ne/BhspaobL0pi1FdS7NjhmirXTQ=; b=RwGKOPUor3CvvjCznNAGx7CgT7i2lT2+76v5k3yTZiJo+a6tFE616x9/zx1z/ibehH 2l2euIHsb19S86Sp16WOphNWCYIFg7RcBYu2HZAVUModFNJIIBnQ89WPfapjHHC+LNOS OLZemoHc0nyNl/Ub1FvFcRJ2ckxVyG6SfD5YQaVfJCKCkGus3+nEc7skEknWXCDGT5Ah 69+5tI6r5oy8hXpV5B9kPka9+cGSj3bhGZQg5xhRo+kirNOGnXIeRaIe+4YivcFsuvNg WvdM99DMq4Kud2UoMrgDqjwq9cSbQid2PEQgd4vIrIdOe1EiPQgg5aPLJq8BsjED7Uyq plTw== X-Gm-Message-State: AOJu0YyOx3Q/01c/craMwKN+FMEibtkMwFTdUDmE8U8Cms2G+dK5HLwQ y6UdsB4g0b4uoUe0OEbMGOjDZCq1FInUbrvND6w= X-Google-Smtp-Source: AGHT+IGQPdI3bWDIsmK/c4iwpiHjOx91l6Y6vRth+VJNYCFlTwX6snGxGP/5yQ5syddLPz3FEGw79P7ytK5vVESnvqU= X-Received: by 2002:a05:6602:3fc8:b0:787:1472:3181 with SMTP id fc8-20020a0566023fc800b0078714723181mr4135503iob.3.1697569401810; Tue, 17 Oct 2023 12:03:21 -0700 (PDT) MIME-Version: 1.0 References: <20231017003519.1426574-1-nphamcs@gmail.com> In-Reply-To: From: Nhat Pham Date: Tue, 17 Oct 2023 12:03:10 -0700 Message-ID: Subject: Re: [PATCH 0/2] minimize swapping on zswap store failure To: Yosry Ahmed Cc: akpm@linux-foundation.org, hannes@cmpxchg.org, cerasuolodomenico@gmail.com, sjenning@redhat.com, ddstreet@ieee.org, vitaly.wool@konsulko.com, hughd@google.com, corbet@lwn.net, konrad.wilk@oracle.com, senozhatsky@chromium.org, rppt@kernel.org, linux-mm@kvack.org, kernel-team@meta.com, linux-kernel@vger.kernel.org, david@ixit.cz Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam09 X-Rspamd-Queue-Id: C244D1C001F X-Stat-Signature: mpn36p4hns8sc7tfipc397fka367fmtb X-Rspam-User: X-HE-Tag: 1697569402-85831 X-HE-Meta: U2FsdGVkX19lc9bl+Yya+bndkzmIjoPbLTHr0wAb+evAxmjXvAyBQUSgBZpyk3Xcv6KeZdX1ZfRRKovMaplTmrzrQ4pKjBCWZHWinwX/x7llvaxYu5CzOZv2EPXA+EiFFUnDwVAyDEJNCDdLxrwU1rO4M5iPleqlklJOXx+a/U9ee0KqGMpB2IgCgaXw/mKYSsbQfJI/KLuD5C+gpvvSOV6arG0u0hGTdZgUwQHuD0Eb9jXGkrFgwcLrss6vgx5+J6fWr/0HO5iRJ//PZ4g2flsKN8b/vQn/Z8IEnBGr/tqLImY7R82GHs24YzV2g4ufJFz1E6K4lmYapYW4w4JR7dpl1w4vCpwWVMd1hjuWsbcBSRbz1xb4JFxjaohTD+HsF5GJcKSS1YuJ9ejv+nSRkzvrVX2VvTnr6MZy5boiqA6zIFU+suvHzeOrsuEMOCmVUg8X9SnElbZpiigC+YUzY1NtctlAa3zhTKgJiQnogpoNVklhu0mwyvoFhPzBiI92Rc77gLwje2vgdQ0KHgiJrdoOvGl3IsfNbWeYZaq+QYtUgi/oTl6lU1ec8uoY0mbJ1mvPptEElJmHPn0LkLKF0aoVu9psgifFWtJPusmyxvfD/aQIsdjHZ73nKMZTtLmcjudtOtl7XimYzd++5cT6udVZ9VfGg2fheAHD1ddtcnU3D/NcNAYuEZlsthB2kVDvx1wi8DN1MRVx77IDZUjpq6vY1BplUebauiW7efOGCOJ+9oevXHfFzuSabn3qUIJ+l6IGECdcVuRyGp8NNqVbEsuqVo//wakrBrM1OyTJES1aI9IQxNv2qH1XkzPntDQZmKAz7IDrWH1IjKK+C2EG6uw7ecGXzJmH85UTuevVSbBptCrdNlq3orTT4ucpnwUdwQCNP+j+DDjZmAht+lC7idIViVgQOawUwu5nVApoPLl7B9X+ClaLGKKmaALEBGUKEKLGUT9x3zOybu9FtH0 3pWIyWpk nYaILEY2hp+goy2oy8k90uh8OXPaxVFfX1NLGZtrihQKNZarwVsGFfqhd71njaMsP6Quc5mH8TlVg8pB8fdUH7WmgtOoVx7FAKqVthTRqDXQAeugL25kjjpKCUl/qf++TH62JbQRGbjTHI71wS+X9WafNdI+KWVegzcWz/a5+ppP/ZRUfRZ8yNnV1XoROFL8TcFTJW/EnqDgBMzZD917wW2YcauiL6kH3e1SeX77tVpqPR+fy9zRwxHu69V08XPbSheuhhJjPBa/L4gfoWsEwyQn1wSzwP+bJhrJr2gpKMXFBwMjx/kumudj0us2Xa40eVTSp8+JDDkHOgP74BmS5u1XHDYyWagaiKHnpu8fO8FGBG6JHlo/2aGt1utWb6Df5sa0sZFclCGMHhYKrrKGU71Xonw== X-Bogosity: Ham, tests=bogofilter, spamicity=0.005288, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Mon, Oct 16, 2023 at 5:58=E2=80=AFPM Yosry Ahmed = wrote: > > On Mon, Oct 16, 2023 at 5:35=E2=80=AFPM Nhat Pham wro= te: > > > > Currently, when a zswap store attempt fails, the page is immediately > > swapped out. This could happen for a variety of reasons. For instance, > > the compression algorithm could fail (such as when the data is not > > compressible), or the backend allocator might not be able to find a > > suitable slot for the compressed page. If these pages are needed > > later on, users will incur IOs from swapins. > > > > This issue prevents the adoption of zswap for potential users who > > cannot tolerate the latency associated with swapping. In many cases, > > these IOs are avoidable if we just keep in memory the pages that zswap > > fail to store. > > > > This patch series add two new features for zswap that will alleviate > > the risk of swapping: > > > > a) When a store attempt fail, keep the page untouched in memory > > instead of swapping it out. > > What about writeback when the zswap limit is hit? I understand the > problem, but I am wondering if this is the correct way of fixing it. > We really need to make zswap work without a backing swapfile, which I > think is the correct way to fix all these problems. I was working on > that, but unfortunately I had to pivot to something else before I had > something that was working. > > At Google, we have "ghost" swapfiles that we use just to use zswap > without a swapfile. They are sparse files, and we have internal kernel > patches to flag them and never try to actually write to them. > > I am not sure how many bandaids we can afford before doing the right > thing. I understand it's a much larger surgery, perhaps there is a way > to get a short-term fix that is also a step towards the final state we > want to reach instead? Regarding the writeback - I'll make sure to also short-circuit writeback when the bypass_swap option is enabled in v2 :) I'll probably send out v2 after I absolutely agree that we must decouple zswap and swap (and would be happy to help out in any capacity I could - we have heard similar concerns/complaints about swap wastage from internal parties as well). However, as Johannes has pointed out, this feature still has its place, given our already existing swapfile deployments. I do agree that a global knob is insufficient tho. I'll add a per-cgroup knob in v2 so that we can enable/disable this feature on a per-workload basis. > > > > > b) If the store attempt fails at the compression step, allow the page > > to be stored in its uncompressed form in the zswap pool. This maintains > > the LRU ordering of pages, which will be helpful for accurate > > memory reclaim (zswap writeback in particular). > > This is dangerous. Johannes and I discussed this before. This means > that reclaim can end up allocating more memory instead of freeing. > Allocations made in the reclaim path are made under the assumption > that we will eventually free memory. In this case, we won't. In the > worst case scenario, reclaim can leave the system/memcg in a worse > state than before it started. > > Perhaps there is a way we can do this without allocating a zswap entry? > > I thought before about having a special list_head that allows us to > use the lower bits of the pointers as markers, similar to the xarray. > The markers can be used to place different objects on the same list. > We can have a list that is a mixture of struct page and struct > zswap_entry. I never pursued this idea, and I am sure someone will > scream at me for suggesting it. Maybe there is a less convoluted way > to keep the LRU ordering intact without allocating memory on the > reclaim path. Hmm yeah you're right about these concerns. That seems like a lot more involved than what I envisioned initially. Let's put this aside for now. I'll just send the first patch in v2, and we = can work on + discuss more about uncompressed pages storing later on. > > > > > These features could be enabled independently via two new zswap module > > parameters. > > > > Nhat Pham (2): > > swap: allows swap bypassing on zswap store failure > > zswap: store uncompressed pages when compression algorithm fails > > > > Documentation/admin-guide/mm/zswap.rst | 16 +++++++ > > include/linux/zswap.h | 9 ++++ > > mm/page_io.c | 6 +++ > > mm/shmem.c | 8 +++- > > mm/zswap.c | 64 +++++++++++++++++++++++--- > > 5 files changed, 95 insertions(+), 8 deletions(-) > > > > -- > > 2.34.1