From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 89338CDB482 for ; Tue, 17 Oct 2023 15:52:06 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 19EC06B0252; Tue, 17 Oct 2023 11:52:06 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 150526B0253; Tue, 17 Oct 2023 11:52:06 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 0163B6B0254; Tue, 17 Oct 2023 11:52:05 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id E2CB26B0252 for ; Tue, 17 Oct 2023 11:52:05 -0400 (EDT) Received: from smtpin11.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id BB42E1CB918 for ; Tue, 17 Oct 2023 15:52:05 +0000 (UTC) X-FDA: 81355394610.11.3782672 Received: from mail-ej1-f43.google.com (mail-ej1-f43.google.com [209.85.218.43]) by imf21.hostedemail.com (Postfix) with ESMTP id C3E3C1C0002 for ; Tue, 17 Oct 2023 15:52:03 +0000 (UTC) Authentication-Results: imf21.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=r2Z6m4nX; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf21.hostedemail.com: domain of yosryahmed@google.com designates 209.85.218.43 as permitted sender) smtp.mailfrom=yosryahmed@google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1697557923; a=rsa-sha256; cv=none; b=XETqQVL3Vnd7IVFqw6sFzSfSW/jLJWU5C6gNBbdTFvRdzym9jEi2kdu+YyUjXFpqb7h9R8 HhczXBZWDNxCuVf5nYmuaMKitVXyNAy56kP0KtAqpOE05ltlDCfqMU9iRCfkSn1pSyCdu1 X5Mr6sZamHIThppTkD6l783PMOfE/VU= ARC-Authentication-Results: i=1; imf21.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=r2Z6m4nX; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf21.hostedemail.com: domain of yosryahmed@google.com designates 209.85.218.43 as permitted sender) smtp.mailfrom=yosryahmed@google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1697557923; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=h+Szb6nB1OdOcsJ1SkkUkZWUvsgJwxEYpCHP9bITXYs=; b=t0pUGEgGd4EzsJOrSItEsVhr72tEA13oMNVlx54wx0gY4zphcSy2ZgcBl0u3KR9Ivht8Ea 92jQ2AXL+NODTej2czM0svW28LzQj+JvT9qZmFD/r3VVrnYZieCzRcZb53Jy6SBvD/LssR h6dHc6tEKQOzymDvteQyRRkb3bAnXVY= Received: by mail-ej1-f43.google.com with SMTP id a640c23a62f3a-9ba1eb73c27so965894166b.3 for ; Tue, 17 Oct 2023 08:52:03 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1697557922; x=1698162722; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=h+Szb6nB1OdOcsJ1SkkUkZWUvsgJwxEYpCHP9bITXYs=; b=r2Z6m4nXi0pKK6r+NHAZAJ3lciu+4/Lj+CaZxBpOjLBkKSmQoZKb/UNb1oKwHv4QNf hl6nfHjGTxC8JOnwLDU5fpIcYfnWH4f6jWG01Gb81L5zW+J3aeBejlXVnabGlvpc1JyU tnr98W5FvzUQNKH8/HJkqMNGzwjiubnRH7baDZ73vx9YimejK4cjsY9D5DD3GWnRzWUn rSWSWf/8bPFYnR9ad7GIk9r0EuABBnquGe5qIWljt4X6s9Ho8Ktac01PuLPQq3AQc5i5 WP2ILanysIYnSvTAp9Il850zcfqUgpH2FQar5w0XPX8NuxxQSNCKhASiUw8Wq24VGgVZ aUhA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1697557922; x=1698162722; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=h+Szb6nB1OdOcsJ1SkkUkZWUvsgJwxEYpCHP9bITXYs=; b=PKFXoj8xzIgUj6x24bTzH3fbG07m/WoaUJp2i2SepmLUxrx5AtlMJpv5NdR+Y+H01Y aRjINtb0xS8M8aDaM9D9av6BhmBq6ZNxBraIg6FigAc1sd9t+5tr4DOse4qteDOk/iFB 3F2hmlrCA8K2hOuWx6ssKH1qFR5fwHvcMDRylVVQAr+kwRoHrRtwb95bZduPtPtHmA+4 MazdmP3Ufue0DnZiNdUCXfEq0IKKLaDVepFJ1MVPuSTCSXVjTQq2TfZjmuVJbMwnFh6H YD6BOB6bV6Lg8CZDo7XA5p8+nV/T9hj2g9dJBWZgNmqkkfPCvHGXSc41h6dvJKEvme8d 2lvw== X-Gm-Message-State: AOJu0YzuMdySYWFs7EHBTJtaDvct4tFCeQBFq+SfrISnR9kpfyuNlVKV 3JgRviEs05clLiv+Ie4EflPN89+X7n5znSrg5Xw5Cw== X-Google-Smtp-Source: AGHT+IEdAMihCV/53OUwqI9AfwU19uoqBtkB4Xmq9GtshNSB9srOSQC8B9l1/4PLxdERSHg8dNXluf9vBjoMVcLg4HY= X-Received: by 2002:a17:907:1c24:b0:9be:68db:b763 with SMTP id nc36-20020a1709071c2400b009be68dbb763mr2141954ejc.71.1697557921772; Tue, 17 Oct 2023 08:52:01 -0700 (PDT) MIME-Version: 1.0 References: <20231017003519.1426574-1-nphamcs@gmail.com> <20231017044745.GC1042487@cmpxchg.org> <20231017145124.GA1122010@cmpxchg.org> In-Reply-To: <20231017145124.GA1122010@cmpxchg.org> From: Yosry Ahmed Date: Tue, 17 Oct 2023 08:51:25 -0700 Message-ID: Subject: Re: [PATCH 0/2] minimize swapping on zswap store failure To: Johannes Weiner Cc: Nhat Pham , akpm@linux-foundation.org, cerasuolodomenico@gmail.com, sjenning@redhat.com, ddstreet@ieee.org, vitaly.wool@konsulko.com, hughd@google.com, corbet@lwn.net, konrad.wilk@oracle.com, senozhatsky@chromium.org, rppt@kernel.org, linux-mm@kvack.org, kernel-team@meta.com, linux-kernel@vger.kernel.org, david@ixit.cz, Wei Xu , Chris Li , Greg Thelen Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspam-User: X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: C3E3C1C0002 X-Stat-Signature: su33omay16nafhecd4chqpazau514hy7 X-HE-Tag: 1697557923-618752 X-HE-Meta: U2FsdGVkX1/qQaxVOXng3wPrEOqBK9R8mthAx1qQ7msYT4LCaso4rAyh5IQ4kRAx6jNFZHWtqKADeUKr0d+fW9yNma2uIXbAojnWmwBw+OEjs94xNHFy+K7ojTIXbcTw942tYRI1cIYIw6aceOKeCDii/gsI2g6Hi4Ah4ujx6+HkC9j9O9LTJ+ulA8WfEXQHTj1o5gxEF5jyjTudr9HGj390afhexBK/BG1AMnBQ2zOvcSny+jThK/szCp3ZERty9NvUuX2I1IN5dueiaNes+JDj/t4WhEaHDRyYkp5wxiIYc4arIfxJB6iGRb9phRt6ifmV/6CxxhhBnfdtU3SQhAJdIZUU/6sPMZs94smYf3+5POdykEZlZgJ+XBC1sDcwy25O/a1/LTKnLDdf4dAMPo8C4stI7zMR2jm5ZLJUorEE3SK4iI28a73v0Mjv8WxV8t19TOCOZT+YtOdrhlSUDe7o9aU54XDGybux5/6vNFqnIggcI3zFVassSuflB84YLtrQIpkkmRqpmjaXydlvAbXAZzPD0sGHOCl2I1L3HP5hwtLHSukbgCrQlSFvg4SjHloYPmcOw93lQL6yTZSLq2eSXHVBcw33rvGdTF4JFbrk7MdHpepvgvWB+DnOytAsywcdoR0FOhqvZG00AhQV5Dk3xKroFfqsZXimMT4hhDMf0CPtOZ4SIU6AeQaVjDZPnJVe7h1n58zMq2j8nsPcdJqfYTDpCyerkproP5iovhzqqYS8ug3Es+ScMwPz71Hrl3Q2Dd9iWmcwEI9rSWrKZ6DHoXmqu2VnRksDQ+gUY7SGCyo3KP86yMKZsu5XI9Efbcu1/1aYSnDK7CsfYEffB+5ZY5JgIoz5fAvNiIAkJPr1cTRTvwG0AZbnnmuZ7fQieNIMDiueYFqpMga2sDrPpvRm86uiJ+ND/orUMEFZ/qeuZk0EqDDazhQleVxlm3BBdcBBPyNqqDWAFhQDcZU OIW08pXG 1iEm5rZbt+0BSQ88lBFJN32XTcFi2vwK9VDuDx406YXuQksxkveHe6xZ/YBWTuAgg1AoWNK6T605QAosiJ2IY/nlGvVzDV7fJTxGv377KGdBdSWsbh+kqu2taeX8C6PBYMgQfvwS5UgKZ8GbhYLAlkTQPhMDp1AVTFvNZ+2DxVqzQa/KjS/z4idrVOHECJwmtWBsANd3dh9ifBorQgKUwu13ghxIncA1cv8ytyIzW3NKepDUhlbMVDUzRE5QHZDQP7al1rTF+Jt6jKg/Kl1NToKNUD4U7GHF0LmCtw10R2NuqcdTWJ1NCLBltfg== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000342, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Tue, Oct 17, 2023 at 7:51=E2=80=AFAM Johannes Weiner wrote: > > On Mon, Oct 16, 2023 at 10:33:23PM -0700, Yosry Ahmed wrote: > > On Mon, Oct 16, 2023 at 9:47=E2=80=AFPM Johannes Weiner wrote: > > > On Mon, Oct 16, 2023 at 05:57:31PM -0700, Yosry Ahmed wrote: > > > > On Mon, Oct 16, 2023 at 5:35=E2=80=AFPM Nhat Pham wrote: > > > So I obviously agree that we still need to invest in decoupling zswap > > > space from physical disk slots. It's insanely wasteful, especially > > > with larger memory capacities. But while it would be a fantastic > > > optimization, I don't see how it would be an automatic solution to th= e > > > problem that inspired this proposal. > > > > Well, in my head, I imagine such a world where we have multiple > > separate swapping backends with cgroup knob(s) that control what > > backends are allowed for each cgroup. A zswap-is-terminal knob is > > hacky-ish way of doing that where the backends are only zswap and disk > > swap. > > "I want compression" vs "I want disk offloading" is a more reasonable > question to ask at the cgroup level. We've had historically a variety > of swap configurations across the fleet. E.g. it's a lot easier to add > another swapfile than it is to grow an existing one at runtime. In > other cases, one storage config might have one swapfile, another > machine model might want to spread it out over multiple disks etc. > > This doesn't matter much with ghost files. But with conventional > swapfiles this requires an unnecessary awareness of the backend > topology in order to express container policy. That's no bueno. Oh I didn't mean that cgroups would designate specific swapfiles, but rather swapping backends, which would be "zswap" or "disk" or both in this case. I just imagined an interface that is more generic and extensible rather than a specific zswap-is-terminal knob. > > > > > Perhaps there is a way we can do this without allocating a zswap en= try? > > > > > > > > I thought before about having a special list_head that allows us to > > > > use the lower bits of the pointers as markers, similar to the xarra= y. > > > > The markers can be used to place different objects on the same list= . > > > > We can have a list that is a mixture of struct page and struct > > > > zswap_entry. I never pursued this idea, and I am sure someone will > > > > scream at me for suggesting it. Maybe there is a less convoluted wa= y > > > > to keep the LRU ordering intact without allocating memory on the > > > > reclaim path. > > > > > > That should work. Once zswap has exclusive control over the page, it > > > is free to muck with its lru linkage. A lower bit tag on the next or > > > prev pointer should suffice to distinguish between struct page and > > > struct zswap_entry when pulling stuff from the list. > > > > Right. > > > > We handle incompressible memory internally in a different way, we put > > them back on the unevictable list with an incompressible page flag. > > This achieves a similar effect. > > It doesn't. We want those incompressible pages to continue aging > alongside their compressible peers, and eventually get written back to > disk with them. Sorry I wasn't clear, I was talking about the case where zswap is terminal. When zswap is not, in our approach we just skip zswap for incompressible pages and write them directly to disk. Aging them on the LRU is probably the better approach here. For the case where zswap is terminal, making them unevictable incurs less page faults, at least for shmem.