From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1644FCDB484 for ; Tue, 17 Oct 2023 14:51:30 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 957C88003C; Tue, 17 Oct 2023 10:51:30 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 9072980009; Tue, 17 Oct 2023 10:51:30 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 7CF3C8003C; Tue, 17 Oct 2023 10:51:30 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 6D01A80009 for ; Tue, 17 Oct 2023 10:51:30 -0400 (EDT) Received: from smtpin09.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 1A0E71A0C90 for ; Tue, 17 Oct 2023 14:51:30 +0000 (UTC) X-FDA: 81355241940.09.FE82AC9 Received: from mail-qk1-f182.google.com (mail-qk1-f182.google.com [209.85.222.182]) by imf26.hostedemail.com (Postfix) with ESMTP id 002B2140013 for ; Tue, 17 Oct 2023 14:51:26 +0000 (UTC) Authentication-Results: imf26.hostedemail.com; dkim=pass header.d=cmpxchg-org.20230601.gappssmtp.com header.s=20230601 header.b="mVPFo3K/"; spf=pass (imf26.hostedemail.com: domain of hannes@cmpxchg.org designates 209.85.222.182 as permitted sender) smtp.mailfrom=hannes@cmpxchg.org; dmarc=pass (policy=none) header.from=cmpxchg.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1697554287; a=rsa-sha256; cv=none; b=KcbtDuDHRKshS0TVnQWR3jWvpaQbphwy3FHUltOgztyGGUGvJJk+HcBUmqPkB45BY2olip cTMXEC26a4gjKq7NDtUj+BPpX+uoF7OwBv4qJfZrJSwCy8+TWEV6dBZb7WTUmeoGwIXuwn C7iEamiAgaNEwN/bMOfqNYXZ5GTBSAA= ARC-Authentication-Results: i=1; imf26.hostedemail.com; dkim=pass header.d=cmpxchg-org.20230601.gappssmtp.com header.s=20230601 header.b="mVPFo3K/"; spf=pass (imf26.hostedemail.com: domain of hannes@cmpxchg.org designates 209.85.222.182 as permitted sender) smtp.mailfrom=hannes@cmpxchg.org; dmarc=pass (policy=none) header.from=cmpxchg.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1697554287; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=N5FO3wWvQMky4isidchl4+kOtGJ4NGoQiZPEFFej4sM=; b=PXX+asSd5n6KtaElCess4JlVJhCc9rv5aJIFpM2sR38UfpCVF9zaWhLQZk5XAThHS+BPxB f9wW2G6uWXLtsZpyx39ffYXnhVw5Mz0zwHDc2onvpUF99oIk0A0nlAM6KnuDS+w9R1onSg T4UfOOED5HsQNqOJofKdXbrbYU9OYpM= Received: by mail-qk1-f182.google.com with SMTP id af79cd13be357-77063481352so570758585a.1 for ; Tue, 17 Oct 2023 07:51:26 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cmpxchg-org.20230601.gappssmtp.com; s=20230601; t=1697554286; x=1698159086; darn=kvack.org; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:references:message-id:subject:cc:to:from:date:from:to :cc:subject:date:message-id:reply-to; bh=N5FO3wWvQMky4isidchl4+kOtGJ4NGoQiZPEFFej4sM=; b=mVPFo3K/yqeMIJjWSEbg+5PGYzo6cPyAD3Vq78l4UPurXlz/LlnkyD5XjAf3jdH277 yupcayYX/PJAJucbaI3ocin69U0Ot63bKy4kz5P8Tu/zN1W+b9nfD6Qwqb1SzgdgcjBQ IByeexu1gtw763ltnGnTmtWGoVVCWBQdfL72lDMomGWLL84NcCfQDhCDMpj6AYA3NGQx qqFxzAVrm84lG/ScQ8saAVymnCEYP7jtSohSYWzvLBm6yS0B6L3vGJXA28o14V5QUhRQ 2APkUnuQjFlAiC74SS/z9FRrmcYytHDvXAfwSEAulm17NbSPWKcdeHrl/bsp+5TAbKON fNrw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1697554286; x=1698159086; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:references:message-id:subject:cc:to:from:date :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=N5FO3wWvQMky4isidchl4+kOtGJ4NGoQiZPEFFej4sM=; b=uLOtRJT8ECw4AJGawd3gWkE6MGq8NV5hl1283uwXbqMpr1oxkul+kqF7ZSG73jRAxm S8jLha0EDtgtQe93pVwDi3KIdoHVNcJfwuwrNN99a3rcZOWh8of/EhVA+0U3fvi/lMty WYkBrKMXnsZ+pj5mNxYLQxntbwhNmOCWWFi6HFTvpcXn9suvJr7ZLY5728zUo5z7XqNL YDP/dZHuApSH9z70azsmULBKOAFXwMbbWYOCBc1MVH+FZeryYT15ckDjAUo4EPzCdK1a GfIZQ460LyMRW3ng2T5Tdl3Kkn6ZoF4OLAeN+nsWQe+5X+ScY0EsWDcgx4HiCPVy5UC1 jA/w== X-Gm-Message-State: AOJu0YxSAaPlk2pl4VFm/qa7zTOKYZWrLtI4vnVsKs95SRoBZ/QCUejO Ntaes/9XCnt8WcwjYE5tbUKchw== X-Google-Smtp-Source: AGHT+IF2q7fH7erJb+ALBTQzyw0uBgX3n0GRSRGDiBPBYXAS8xGIvLeQTOS1Axkw//NbOBVGus1Ayw== X-Received: by 2002:a05:620a:2586:b0:76e:f686:cad8 with SMTP id x6-20020a05620a258600b0076ef686cad8mr2571083qko.13.1697554285812; Tue, 17 Oct 2023 07:51:25 -0700 (PDT) Received: from localhost ([2620:10d:c091:400::5:f369]) by smtp.gmail.com with ESMTPSA id q17-20020ad44351000000b0065b1bcd0d33sm606598qvs.93.2023.10.17.07.51.25 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 17 Oct 2023 07:51:25 -0700 (PDT) Date: Tue, 17 Oct 2023 10:51:24 -0400 From: Johannes Weiner To: Yosry Ahmed Cc: Nhat Pham , akpm@linux-foundation.org, cerasuolodomenico@gmail.com, sjenning@redhat.com, ddstreet@ieee.org, vitaly.wool@konsulko.com, hughd@google.com, corbet@lwn.net, konrad.wilk@oracle.com, senozhatsky@chromium.org, rppt@kernel.org, linux-mm@kvack.org, kernel-team@meta.com, linux-kernel@vger.kernel.org, david@ixit.cz, Wei Xu , Chris Li , Greg Thelen Subject: Re: [PATCH 0/2] minimize swapping on zswap store failure Message-ID: <20231017145124.GA1122010@cmpxchg.org> References: <20231017003519.1426574-1-nphamcs@gmail.com> <20231017044745.GC1042487@cmpxchg.org> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: X-Rspamd-Server: rspam08 X-Rspamd-Queue-Id: 002B2140013 X-Stat-Signature: y16wehktwcbfhieo48p8f43djbut6s4u X-Rspam-User: X-HE-Tag: 1697554286-217864 X-HE-Meta: U2FsdGVkX18e9rf2CoiPuJNwMnMTmkqzFVhcy4Zg0+mu8XHpwVUZx0anaeOQ6r/Ocw7cpVQQUEWir/UVX3PDppOfzmU2FeinOWrSNznUfDdS/o7dlKET96IVj92Mn+eIWop4rtlEfa3osJel/oNDNNmyStgXtzfRZAdIaYguPD4nBfCZGOmeBfgef90qM5YYY2UWClS2HIucYMQpHQJU4zJ6dUOJES+eS9k09C3gmdrYHTMkWWPt6tA+s65APGGu8hVMy1Me4q0XacUZdKJXgZ8UE7AP8M/ZYZgAv6Cmr9zM/Kmz1NEls7kxjaa9mlg5qZNU8PKzVxK8tp/Rsw/2WfDdkb6ypFfKW1aPG2FnVBSD82pHc3cLOru48xbETtK7G/nzeBsgCDHMpDe8cIQ8emhGatLeKMqcrdOTEkYZURfyVXgc5SCIMyEdyfqFwIADeIvhXI2PCMP9p0cWmR92vWz9rLq2IpwAc3zwXwVUFgQUTIsmtTsdJR+wepbYnduGCW+dBpy0jaPnZGBT3ESs0BxxBTbAZqqm0fcvR8JUx8iPFJNw5U/pD5d0/tCuV2zfE0djVgi3WAqYkjBLoXxBY2su7WtG3L3Voh41s0Ry9z90MS4lmnyA6ykMVg7Ldl3o9Z9OpegD6B8nJJTsIPuPtPJWnYpL90Ljsqou4+2gj4JLaiw4R5meOMNhsD5KbIj/DkX063CGaOSA2PgC0/7h5TZR6tNwQRgiBwd/9S8U5+DizslV0JgA9+MafQx9H7paX+uaQwtfN8caBSJ4Z04Q8MmRNlLZD2kiJ3K6zTUxmjiwnzePdumQkSH94NSam5CPpZLYMIx7WL5Hhzj0b0dgoyg2nn5RHw1Ot/pvytNYydrhKoQLJ+gmOS10nY7RjLzg8Xbi0ORwQ3jbb31easdq+YcDOmKmJdxAo247qu2Bxbs0GyDUcYA1PWgulD8hMzue8KQLX0AsvHWWfDD22eH UbEJqWmZ dEMxCEZZxujZ/epsxWPFOTdECtcN1X6A7lXXh6cDDyoy6kDg9I7KU02649+VGZLbbK2rTx1ljWgDKaIX1fjXxxOJUWqa6xWvrlRXItbFfx3N/Bt2P9/v6ZzVBlApYaYRjtrult99D3BOA0zNVSxWFu/k+Rb2m7Oh/3KmlBTUOwzs0M5W8zrHt7/U1kq5paMx9trejgBaCoBzg1zp+C+sIUtSDZNV7qPnnXFsDX+5hJN/OFZirUbvO9ORntV/kBOKF1KMY9eVzcANVzZX42id6qe1u8lQ+U/ijwKFMnZa9rKdHUB76FA7vMjL6rto0hC/4x5h1NwhRnBDn1cA8M3gWyLX72BlrauhP+az9XA/dzYSw83lnVOz351LiZIi6Tou+lTo1PpUQVuKihpw3pgxvfw8NPBGQ/sCvJ7eJTciKXx4/o2s+N7MPJiJjbgqhi7JIwd13HxaFVKo5Tm/oiyfx3vxBmQPPw5SSGIA0kdgTdLSozYowhjkFvvrr0uj8IraNry85 X-Bogosity: Ham, tests=bogofilter, spamicity=0.023777, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Mon, Oct 16, 2023 at 10:33:23PM -0700, Yosry Ahmed wrote: > On Mon, Oct 16, 2023 at 9:47 PM Johannes Weiner wrote: > > On Mon, Oct 16, 2023 at 05:57:31PM -0700, Yosry Ahmed wrote: > > > On Mon, Oct 16, 2023 at 5:35 PM Nhat Pham wrote: > > So I obviously agree that we still need to invest in decoupling zswap > > space from physical disk slots. It's insanely wasteful, especially > > with larger memory capacities. But while it would be a fantastic > > optimization, I don't see how it would be an automatic solution to the > > problem that inspired this proposal. > > Well, in my head, I imagine such a world where we have multiple > separate swapping backends with cgroup knob(s) that control what > backends are allowed for each cgroup. A zswap-is-terminal knob is > hacky-ish way of doing that where the backends are only zswap and disk > swap. "I want compression" vs "I want disk offloading" is a more reasonable question to ask at the cgroup level. We've had historically a variety of swap configurations across the fleet. E.g. it's a lot easier to add another swapfile than it is to grow an existing one at runtime. In other cases, one storage config might have one swapfile, another machine model might want to spread it out over multiple disks etc. This doesn't matter much with ghost files. But with conventional swapfiles this requires an unnecessary awareness of the backend topology in order to express container policy. That's no bueno. > > > Perhaps there is a way we can do this without allocating a zswap entry? > > > > > > I thought before about having a special list_head that allows us to > > > use the lower bits of the pointers as markers, similar to the xarray. > > > The markers can be used to place different objects on the same list. > > > We can have a list that is a mixture of struct page and struct > > > zswap_entry. I never pursued this idea, and I am sure someone will > > > scream at me for suggesting it. Maybe there is a less convoluted way > > > to keep the LRU ordering intact without allocating memory on the > > > reclaim path. > > > > That should work. Once zswap has exclusive control over the page, it > > is free to muck with its lru linkage. A lower bit tag on the next or > > prev pointer should suffice to distinguish between struct page and > > struct zswap_entry when pulling stuff from the list. > > Right. > > We handle incompressible memory internally in a different way, we put > them back on the unevictable list with an incompressible page flag. > This achieves a similar effect. It doesn't. We want those incompressible pages to continue aging alongside their compressible peers, and eventually get written back to disk with them.