From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2657DCDB465 for ; Tue, 17 Oct 2023 00:58:13 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id B53AD8D00E1; Mon, 16 Oct 2023 20:58:12 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id B1D688D00DE; Mon, 16 Oct 2023 20:58:12 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 96F948D00E0; Mon, 16 Oct 2023 20:58:12 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 021DC8D00DE for ; Mon, 16 Oct 2023 20:58:11 -0400 (EDT) Received: from smtpin03.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 99368140CA8 for ; Tue, 17 Oct 2023 00:58:11 +0000 (UTC) X-FDA: 81353141982.03.979F996 Received: from mail-ej1-f54.google.com (mail-ej1-f54.google.com [209.85.218.54]) by imf01.hostedemail.com (Postfix) with ESMTP id D177B4000F for ; Tue, 17 Oct 2023 00:58:09 +0000 (UTC) Authentication-Results: imf01.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=sePuKrn7; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf01.hostedemail.com: domain of yosryahmed@google.com designates 209.85.218.54 as permitted sender) smtp.mailfrom=yosryahmed@google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1697504289; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=XopfoIaD0vICDU6P6tysbL85r4xFf9d1dB0gt4rGljE=; b=aNlleaPfoETLd6y5Xcs21lUARa2UuF0sbhzMK6OFwpRZnkeMk2gjDpy1rx4htl1LsSwR2Z kx2B4gIORnNClG4IutzlLVbSAK3YZealeuSxOCpuGrOSikQrEmiZmBZ745sP/awOdmV09P alcnWcON494jkQVf045Kxb6DbIWVnxY= ARC-Authentication-Results: i=1; imf01.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=sePuKrn7; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf01.hostedemail.com: domain of yosryahmed@google.com designates 209.85.218.54 as permitted sender) smtp.mailfrom=yosryahmed@google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1697504289; a=rsa-sha256; cv=none; b=ZVGBJzpPEro7GYyGscykJtM6Xm4eyDrxYxTTjtRpCsGkuVteEYCjoxPVwU5GRyx0aU9ZjW T2QuYq49lFaAJ3U7ykmIZd9zTJHj2y1vhEbr3geYVXgv8+1V9SVu+UtgepBmDkJ71keJr8 iQN2ZRuVOlBGdVqi+L0DQyore1g7SR4= Received: by mail-ej1-f54.google.com with SMTP id a640c23a62f3a-9a58dbd5daeso832633466b.2 for ; Mon, 16 Oct 2023 17:58:09 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1697504288; x=1698109088; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=XopfoIaD0vICDU6P6tysbL85r4xFf9d1dB0gt4rGljE=; b=sePuKrn7qG8Jg/ncvVRh6+x1tfU2A3BEtBbNNRm1cvRXO6ATBJSMfs31Ot4QnRa3Og vsvmInX61VPGhDX/NMq1YwLGQLe+pO1fHjp+9sgSqxjPCQTmSmFaNpPmUBjAsILLWZm9 uGZn+pC0ilmkrFQIr+Ak+8lBQRNaOrn/+o75aA9YA4oDqp0dr9ns0vPUgTVYAUq4Mgt1 E1hUIDscXa0k2hA92sy+SR6OAldwJzmBnf4Lsv1F5Mm0IakY9nz+TJ9k/2XN22qC9GMY 1MTnS1UlMcjjNLp1hqpywnGEmb6W9fJ5+umBlz6oaEHD4KIWdFWDgr9oTO0su1jio2j1 TQ3A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1697504288; x=1698109088; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=XopfoIaD0vICDU6P6tysbL85r4xFf9d1dB0gt4rGljE=; b=voPxtCUfom8qvr3s+vfknwOud9xny9DdoZ9w3aFp/B0FLD1xjODQ3WAasmrx3dA7jq 3x0LmeVuapg46X6e1ZLl7ayjDTi/2vn8bpcDybtCAZOvr/iOHGDj8GawPahVa/cETkNu v2p2chAaK+92B51GG20GR9PQj0F/E3ebADvasVY/V/TXBdV1d0aL47HNgcWSxUYG2fKx jDpGKOkQmvCnCCIXK1OfbnJfeinQiLHHiInZkOgwYEiGeuDf+FRBo+83jylJPL4zJYiC MdQf4LwHInFffvzcJKaz3Flv22dGmsGcXjirlKZVnWVCZL6Pygi3DFJJd1oc9PufQCdV OQ8g== X-Gm-Message-State: AOJu0Yyf782VsaM+8GMpMZxyI60a1ec+A+ndjqMJyd9A915KdzGWW9Q9 akl6MMtkIpGuNw5vXm8Fz1fFe01K2yAIFjsG2mKyXQ== X-Google-Smtp-Source: AGHT+IFK1M6KpqfG7RQqhCU3ImlGX/ViDMUnVCHADn/RkIcTD7fh3M3Q+wiR9AVlEOf9vxfvlyzeJkQ4Rt7sWmp6n2E= X-Received: by 2002:a17:907:2cc4:b0:9bd:cd8b:6253 with SMTP id hg4-20020a1709072cc400b009bdcd8b6253mr437887ejc.33.1697504288035; Mon, 16 Oct 2023 17:58:08 -0700 (PDT) MIME-Version: 1.0 References: <20231017003519.1426574-1-nphamcs@gmail.com> In-Reply-To: <20231017003519.1426574-1-nphamcs@gmail.com> From: Yosry Ahmed Date: Mon, 16 Oct 2023 17:57:31 -0700 Message-ID: Subject: Re: [PATCH 0/2] minimize swapping on zswap store failure To: Nhat Pham Cc: akpm@linux-foundation.org, hannes@cmpxchg.org, cerasuolodomenico@gmail.com, sjenning@redhat.com, ddstreet@ieee.org, vitaly.wool@konsulko.com, hughd@google.com, corbet@lwn.net, konrad.wilk@oracle.com, senozhatsky@chromium.org, rppt@kernel.org, linux-mm@kvack.org, kernel-team@meta.com, linux-kernel@vger.kernel.org, david@ixit.cz Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: D177B4000F X-Rspam-User: X-Rspamd-Server: rspam02 X-Stat-Signature: m4d7z9r96q1pngc4dp5j7xgjnnuhuosc X-HE-Tag: 1697504289-616984 X-HE-Meta: U2FsdGVkX1+KomwrzOjC1hQToeG3UGTCTCHmE9OefQd3vc9m6tgI2bJxKy4EqP6uKyJlQUaj5v65V27w9+TA/mwrzajyM7mX9v0gWgE4ciHRX6M+d0GVVi05MESwbnRxsXHAZ+4hgD/9X2rXKdVWaOdls7QGEoTrlkiwBY23NETCBeV0Ikcf2BzzwnBaeHovg6EPDVHWNIXA+9bsVZBLrSIbrhlTEwzlw35hkqi0fadxe8P8ZROHU6dBWlwwhhxJFEV2wGdejpp7U8wCil/iZRZm7FyyhBW6h7JZUQDAV9bZjp5vy8sBEY4E4/r1lsi6WCBY5ZO7xSN0bYQq+YfRnw3E6kOx0BBAReKHlJqGjhRwwfkQEjrRp7TfJRwOKD/hm9OZQTaQg5RDrFGC1bVHzFaXbf1rwD/FHVU4O1PbeUg+JugqW9hwG7DAkgfktf5wgkLCA02iKgEqjGSNoLhDWxDgisWCMGBYk+iXSOfEWV8rxXk++M2W/JMbCmtnrlvD1HGuwAzatI+dM3glkpPpjeRioiO113aDCUbs620gkxVhRSTqwI8BgL5HvD68SJx50pGJ5fePFM07j2nrwYw86XdpT3XnSfEpwXy1yO+7jN+RsdUFqEZ352+Rn/oXIbzq0iIjUBnpBEoCGvuksmygbL4u8dlOjzenb24h7M+I1xQFcmXBcSz1XFqrAXFasLi2By+cp3d0qFSHjcy0sb+kJ8HPBWgGsxI2YXX06KUY/baBnFTnj96k9Je/ncBUkYicPP1XLE260jWuuAdOq26nXZd7yWuM1XPQX++5QRmAj8fyYRmK7+3CVQT5a/DDhKRqURfD8rL8EIMESwqi6L3oeRMjFl2f7/5iyMLwUcGdy0KKOMf6y3a+vIpUGuqLGV9rRaxin2XT4n+cc86Q+stCgv9gIwnbjR0yBGkTIcu3mSyDeDbBQ0DZ1SCxLFEXwgeouvPhuZvYCzjff942Cud InPReC6l BfKKXbIlzfybPse+iRFx2Iny3kcDyWiUfbhCUdRO1UiB2dwG8CRdxKyaAKI908GlI1Nif72PG2amQeRxGEgernzAmV6btsqct7DFZelFhzXwtJ6qwhCab0Om9eok0CprmlNyCGDHFWuFBgeTPV9l/dnGqwwSziGvrzX8S X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Mon, Oct 16, 2023 at 5:35=E2=80=AFPM Nhat Pham wrote= : > > Currently, when a zswap store attempt fails, the page is immediately > swapped out. This could happen for a variety of reasons. For instance, > the compression algorithm could fail (such as when the data is not > compressible), or the backend allocator might not be able to find a > suitable slot for the compressed page. If these pages are needed > later on, users will incur IOs from swapins. > > This issue prevents the adoption of zswap for potential users who > cannot tolerate the latency associated with swapping. In many cases, > these IOs are avoidable if we just keep in memory the pages that zswap > fail to store. > > This patch series add two new features for zswap that will alleviate > the risk of swapping: > > a) When a store attempt fail, keep the page untouched in memory > instead of swapping it out. What about writeback when the zswap limit is hit? I understand the problem, but I am wondering if this is the correct way of fixing it. We really need to make zswap work without a backing swapfile, which I think is the correct way to fix all these problems. I was working on that, but unfortunately I had to pivot to something else before I had something that was working. At Google, we have "ghost" swapfiles that we use just to use zswap without a swapfile. They are sparse files, and we have internal kernel patches to flag them and never try to actually write to them. I am not sure how many bandaids we can afford before doing the right thing. I understand it's a much larger surgery, perhaps there is a way to get a short-term fix that is also a step towards the final state we want to reach instead? > > b) If the store attempt fails at the compression step, allow the page > to be stored in its uncompressed form in the zswap pool. This maintains > the LRU ordering of pages, which will be helpful for accurate > memory reclaim (zswap writeback in particular). This is dangerous. Johannes and I discussed this before. This means that reclaim can end up allocating more memory instead of freeing. Allocations made in the reclaim path are made under the assumption that we will eventually free memory. In this case, we won't. In the worst case scenario, reclaim can leave the system/memcg in a worse state than before it started. Perhaps there is a way we can do this without allocating a zswap entry? I thought before about having a special list_head that allows us to use the lower bits of the pointers as markers, similar to the xarray. The markers can be used to place different objects on the same list. We can have a list that is a mixture of struct page and struct zswap_entry. I never pursued this idea, and I am sure someone will scream at me for suggesting it. Maybe there is a less convoluted way to keep the LRU ordering intact without allocating memory on the reclaim path. > > These features could be enabled independently via two new zswap module > parameters. > > Nhat Pham (2): > swap: allows swap bypassing on zswap store failure > zswap: store uncompressed pages when compression algorithm fails > > Documentation/admin-guide/mm/zswap.rst | 16 +++++++ > include/linux/zswap.h | 9 ++++ > mm/page_io.c | 6 +++ > mm/shmem.c | 8 +++- > mm/zswap.c | 64 +++++++++++++++++++++++--- > 5 files changed, 95 insertions(+), 8 deletions(-) > > -- > 2.34.1