From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <owner-linux-mm@kvack.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 5FB8EC369C2
	for <linux-mm@archiver.kernel.org>; Tue, 22 Apr 2025 15:00:40 +0000 (UTC)
Received: by kanga.kvack.org (Postfix)
	id B39C86B0006; Tue, 22 Apr 2025 11:00:38 -0400 (EDT)
Received: by kanga.kvack.org (Postfix, from userid 40)
	id AC4316B000A; Tue, 22 Apr 2025 11:00:38 -0400 (EDT)
X-Delivered-To: int-list-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix, from userid 63042)
	id 93BA96B000D; Tue, 22 Apr 2025 11:00:38 -0400 (EDT)
X-Delivered-To: linux-mm@kvack.org
Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11])
	by kanga.kvack.org (Postfix) with ESMTP id 6F26D6B0006
	for <linux-mm@kvack.org>; Tue, 22 Apr 2025 11:00:38 -0400 (EDT)
Received: from smtpin12.hostedemail.com (a10.router.float.18 [10.200.18.1])
	by unirelay08.hostedemail.com (Postfix) with ESMTP id 84A03141736
	for <linux-mm@kvack.org>; Tue, 22 Apr 2025 15:00:38 +0000 (UTC)
X-FDA: 83361991356.12.19DBCD7
Received: from mail-yw1-f178.google.com (mail-yw1-f178.google.com [209.85.128.178])
	by imf28.hostedemail.com (Postfix) with ESMTP id 72B3AC0013
	for <linux-mm@kvack.org>; Tue, 22 Apr 2025 15:00:36 +0000 (UTC)
Authentication-Results: imf28.hostedemail.com;
	dkim=pass header.d=gmail.com header.s=20230601 header.b="EOue8/+Z";
	spf=pass (imf28.hostedemail.com: domain of joshua.hahnjy@gmail.com designates 209.85.128.178 as permitted sender) smtp.mailfrom=joshua.hahnjy@gmail.com;
	dmarc=pass (policy=none) header.from=gmail.com
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com;
	s=arc-20220608; t=1745334036;
	h=from:from:sender:reply-to:subject:subject:date:date:
	 message-id:message-id:to:to:cc:cc:mime-version:mime-version:
	 content-type:content-transfer-encoding:content-transfer-encoding:
	 in-reply-to:in-reply-to:references:references:dkim-signature;
	bh=YeAk6MyHc6xLXP/13BBrx6xteBuCRy70CN1qsBOrGPw=;
	b=ADRQlvZoP5VhqCvTr1DdEAf9Kc1WYdNq7BeWobyRXF4apH6yaUT7v27oK/th6XG71X7Fue
	WrIdRItp7KR/JCrY9SCyLRkbYM96Zy3QPRmMxUWO5y1+NR1yipCUMideQMsukgmYCI9h55
	tMcA4sVFjZn80VadOtnVnvE0B3oMg1c=
ARC-Authentication-Results: i=1;
	imf28.hostedemail.com;
	dkim=pass header.d=gmail.com header.s=20230601 header.b="EOue8/+Z";
	spf=pass (imf28.hostedemail.com: domain of joshua.hahnjy@gmail.com designates 209.85.128.178 as permitted sender) smtp.mailfrom=joshua.hahnjy@gmail.com;
	dmarc=pass (policy=none) header.from=gmail.com
ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1745334036; a=rsa-sha256;
	cv=none;
	b=XvuvNIuE6/AO+NhK5ONj1fjvmrjDUiEdizD8YU6zP0JdUR0QkmeTLpSMLPWrOBTuGq0+RW
	WeFJvmEbdlnjb7SSCn5pFzX/W7MXhNKQEIWHMp308me42d2cNvM+VPCSV7txsdXvcLJw7M
	mmJno+gt+HeMzb3m9PJiHlYcChmhuRM=
Received: by mail-yw1-f178.google.com with SMTP id 00721157ae682-7080dd5fe92so8564907b3.3
        for <linux-mm@kvack.org>; Tue, 22 Apr 2025 08:00:36 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20230601; t=1745334035; x=1745938835; darn=kvack.org;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:from:to:cc:subject:date
         :message-id:reply-to;
        bh=YeAk6MyHc6xLXP/13BBrx6xteBuCRy70CN1qsBOrGPw=;
        b=EOue8/+ZqKWPsr9/mTSsdjX3qZ7X6PcwwYX0tuNFfD5CIJEPdMlBldEkyQpT7m3QV8
         +NxOYySCX1LAFF2brHOxPuampKJOdQO0eBFki+ro3u+8QqqSkwHZZoM1raNElPxKz8bU
         Ae2tuEEN2lPkIE+Y3XrZLGA2ifrQZn8JhzXjHMplRfMdzAmZlf3uQVo0Pp/T9f3b7o3m
         L6GN4FxnrPjCdh2LE1rfic8u9fEBKwObddimgSGKvHoTjweiLP5GbQ4droTOJ1+ZroG+
         J7/SH+iXv5jPz69oVM6VUeQ9j0UovgRqb+KsAjzFmKNU9HaMAuSSvWfzfiYSoHU9PkBz
         55wg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1745334035; x=1745938835;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc
         :subject:date:message-id:reply-to;
        bh=YeAk6MyHc6xLXP/13BBrx6xteBuCRy70CN1qsBOrGPw=;
        b=g+XuKug9MaYqwUqyW97nxcRU2dHKgR8bv8yPLxo4UKK/qyWxRZV8Uv8iUG4q8fPENv
         v+PKavqGJvz5fFRhHMcsAUxJyjbc3vBc60rRVBAKhnwfZ9L7h9bP+ZBDA2TpLEUL5R49
         6B3BpFyN7Ks6AxN0YbC/vh4NiYnK39vxTWDHAfhEwW4YlMxjVQRzMxms0tZsHS3LGvQC
         EIKvYTteOuUzHC7mK7TaxcdcRBm3tUJXihvU1DvfCfVYIEUIYxjDpIUArtp6nk2u4Oqp
         cOE0U58kJHdge42Gekao7CvhUaP/0PzYJJYj729TSaeqwPSx5zb1CFGCiTwTFIIBN28f
         Q2jg==
X-Forwarded-Encrypted: i=1; AJvYcCWt9gBvsax0nnyGvVKVzJse7iInQ82SCFABfCrM/qSDztbzISwXoMd6tCwe08LQPP3ncwZb9u9esQ==@kvack.org
X-Gm-Message-State: AOJu0Yy+WLnvymTwUU3blrAZP/Ik4JEdPq6YklCK/PX7QmVhn10ycNIQ
	hJft4rNdZ/t3GyKJ/7hZ3zb3Sx07wa7MaOApRyMZBq3B0iCUm8F8
X-Gm-Gg: ASbGnct1Sr0q7yifu5R1vcK/LH7X5XtHuNclpkekWK48MecrKRgxz0JNqpnG9dkbIL1
	ppllP/E5MzNHJw5xflJNWPuh96mYcGe8JdJ/0Yx6jo9i3c7r6MVhlAeuxL8SU7C7xHfE3bbBWq7
	LCPiMdJOcH6Y9MmkjZUDacPjWo0ozJwHhRCseKJ3rE2VOC3soq5KClPP8PXRikSlv3yiToOP12B
	6ahviUB6o2OAx8f6+9GDyFn6RlCzU+KKbD1H6oPWw1NOES7FnYscEZWO74IA8C4s1L6yg0X8ZzR
	GTlfQdE1xoeiOEVzqUboFpo5/TjAqqs75Bav
X-Google-Smtp-Source: AGHT+IFUjbW1XWnRQ05I8CcYEAQHzRwJyH/lX3NojwOrr018pgYK+qTFz98YZYtoLI0IwFn/4EC3oQ==
X-Received: by 2002:a05:690c:6e0d:b0:705:750e:851 with SMTP id 00721157ae682-706cce1f305mr246404357b3.37.1745334035021;
        Tue, 22 Apr 2025 08:00:35 -0700 (PDT)
Received: from localhost ([2a03:2880:25ff::])
        by smtp.gmail.com with ESMTPSA id 00721157ae682-707d4e9d386sm18765897b3.121.2025.04.22.08.00.34
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Tue, 22 Apr 2025 08:00:34 -0700 (PDT)
From: Joshua Hahn <joshua.hahnjy@gmail.com>
To: Yosry Ahmed <yosry.ahmed@linux.dev>
Cc: Yosry Ahmed <yosryahmed@google.com>,
	Nhat Pham <nphamcs@gmail.com>,
	akpm@linux-foundation.org,
	hannes@cmpxchg.org,
	cerasuolodomenico@gmail.com,
	sjenning@redhat.com,
	ddstreet@ieee.org,
	vitaly.wool@konsulko.com,
	hughd@google.com,
	corbet@lwn.net,
	konrad.wilk@oracle.com,
	senozhatsky@chromium.org,
	rppt@kernel.org,
	linux-mm@kvack.org,
	kernel-team@meta.com,
	linux-kernel@vger.kernel.org,
	david@ixit.cz
Subject: Re: [PATCH 0/2] minimize swapping on zswap store failure
Date: Tue, 22 Apr 2025 08:00:31 -0700
Message-ID: <20250422150033.1867401-1-joshua.hahnjy@gmail.com>
X-Mailer: git-send-email 2.47.1
In-Reply-To: <aAd9G_BYEcNwVuSd@Asmaa.>
References: 
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
X-Stat-Signature: 9hzm49nr3f8hewsc97fbjhou6e8111cq
X-Rspamd-Server: rspam01
X-Rspamd-Queue-Id: 72B3AC0013
X-Rspam-User: 
X-HE-Tag: 1745334036-935097
X-HE-Meta: U2FsdGVkX1++nwRNywAuFml98KgUSbsePnr2UuxF4FNP9+DQZbWTvBi48MG71FPPxE9vfKNSeNn6F/mqqbA8oIcAVkXvxJG4TE62UDrkt3IAWy7IdrnyN2jrmyULlX2rJdW092h7+wDJxC3gdKCg8z2R4H4fcyLLMk7RfuJZ/hWtbtmtducwIoqTbunj03J4GfCYegfnLKrUvn5yxH67a/yJXN3acTBCdTOcS8oezmntMU1rha7QCuTJP4CYtAJH0+R8WSCrAYaiAc4ixN+g7kYNL6YI94ywpo0j+lqxaHvUxPDS3I4O0WH2uDcGhre4/0pzo04vanjhNchTabuqd3W0edVOAJLAfywWYrKMGkDJg1A/Kz83LRHIPjVtVEMA2GQTtrs6J4MakflQSqPf2NVD+LGH5Nc54ps/z1dsmrx2/r6YJvPruzPKCmxNveOtzUF4Wjil3M6MZpXQMqWno2nG67yoMg8ZCeYB0qfPZgJWzeIRlvh+IOXL/YLkxlTmjBpo0vmcci00F1H2iR0PsxG5EmOHwsUWXR16xQc/ZUOOBgpD3xeb5I2SzZOn2pbx0yM67qYRf3rL33PzQ06u934wAxdrLUOcBGjhJEDobdJnxKFm8zZeoyniWtd1FERHMgoA9onC+XS6JfjA+tQmfOqMzGITHOYzTJDHPWDLIy1r7Hqn3Hq+G5rgdIFrbkH8VQF3bxqwGqhrlktcJ4if6cCsCs0cADQ4H21P8BfKuM1wOACya8CRVLGjNAbHXAgmCQHgD3MmrE7YBrf+2McW0VelDkm6vitf7zuWZeqis3zBXhAVzcS6cLtrFO6rWz4n2r0rs3n6OVi2eZoDD5RUcQweuLhZZb6eCxGDyYBVcaRIRk7kU3AYchsAwBWe3sJHAr7wSSSUqldhUgcvpV0GQdGgNutaN9ootQkG+pbaDxBfILHfKsUy33I+4YD58mtaK3IRKmwlZJoSMb0Hhej
 mYnajy1P
 h3P9Ri8qBAMtU0odQkZtpA23LWrmQGiRbC6Bujpalp4UA+5ZJ2NZRzGECc7v/t/mSwec6DUFdtFyjPbGNEzu8MoLLBZqURZ2s5qs1vLRA6AUhyNkbBHvXDreCWdJmDxeMz8bJzj+YNQZtPZoeHstjTpRkrD8U49i4SZk4XI4HtrA/nb7ae14p4Yow1lWs5z6HAql4iGHVneetA98njmu/Q//D7QjkvtZNS0Z/u8YVDnXM6FnHjVVDiATd1ilWpMt7h0cm0M5rQ3LqUMz0MYlKYuYAOlRNazLLjjP0m66zHCqcZffYoNpHkL3TUif6BQGAZbRxnT0nxWJjbNr5sC4nVFCkRlhVjxs/PZQ8nJtNTvBsVxkt2T6jshQ/xAkczMUfAgB6DkSJFpQpMLA0siekMxpqweSyZDXNbu4ZE7j3wrvnzaoPhS2Zi2LlyAqVrpxyJ/Pienp801JgfDoe18REbTA9UOFO+rTwWJOd
X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4
Sender: owner-linux-mm@kvack.org
Precedence: bulk
X-Loop: owner-majordomo@kvack.org
List-ID: <linux-mm.kvack.org>
List-Subscribe: <mailto:majordomo@kvack.org>
List-Unsubscribe: <mailto:majordomo@kvack.org>


> > > I thought before about having a special list_head that allows us to
> > > use the lower bits of the pointers as markers, similar to the xarray.
> > > The markers can be used to place different objects on the same list.
> > > We can have a list that is a mixture of struct page and struct
> > > zswap_entry. I never pursued this idea, and I am sure someone will
> > > scream at me for suggesting it. Maybe there is a less convoluted way
> > > to keep the LRU ordering intact without allocating memory on the
> > > reclaim path.

> > So I've implemented your idea, using the lower 2 bits of the list_head's prev
> > pointer (last bit indicates whether the list_head belongs to a page or a
> > zswap_entry, and the second to last bit was repurposed for the second chance
> > algorithm).
> 
> Thanks a lot for spending time looking into this, and sorry for the
> delayed resposne (I am technically on leave right now).

Hi Yosry,

Thank you for getting back to me! I hope you are enjoying your leave : -)

> > For a very high level overview what I did in the patch:
> > - When a page fails to compress, I remove the page mapping and tag both the
> >   xarray entry (tag == set lowest bit to 1) and the page's list_head prev ptr,
> >   then store the page directly into the zswap LRU.
> 
> What do you mean by 'remove the page mapping'? Do you mean
> __remove_mapping()?

Yes -- but I am calling remove_mapping() to unfreeze with a refcount of 1
(zswap is now the sole owner of the page). 

> This is already called by reclaim, so I assume vmscan code hands over
> ownership of the page to zswap and doesn't call __remove_mapping(), so
> you end up doing that in zswap instead.

Yes! I changed reclaim logic to be aware that zswap can do this, so there
is a new switch case that simply continues through the folio list when
zswap steals the incompressible page (but we don't want to drop the page).

> > - In zswap_load, we take the entry out of the xarray and check if it's tagged.
> >   - If it is tagged, then instead of decompressing, we just copy the page's
> >     contents to the newly allocated page. 
> > - (More details about how to teach vmscan / page_io / list iterators how to
> >   handle this, but we can gloss over those details for now)
> > 
> > I have a working version, but have been holding off because I have only been
> > seeing regressions. I wasn't really sure where they were coming from, but
> > after going through some perf traces with Nhat, found out that the regressions
> > come from the associated page faults that come from initially unmapping the
> > page, and then re-allocating it for every load. This causes (1) more memcg
> > flushing, and (2) extra allocations ==> more pressure ==> more reclaim, even
> > though we only temporarily keep the extra page.
> 
> Hmm how is this worse than the status quo though? IIUC currently
> incompressible pages will skip zswap and go to the backing swapfile.
> Surely reading them from disk is slower than copying them?
> 
> Unless of course, writeback is disabled, in which case these pages are
> not being reclaimed at all today. In this case, it makes sense that
> reclaiming them makes accessing them slower, even if we don't actually
> need to decompress them.

Yes, sorry for the ambiguity -- this was specifically for the writeback
disabled case. My focus currently is on reducing the amount of CPU cycles
spent stuck on trying to compress incompressible pages. 

> I have a few thoughts in mind:
> 
> - As Nhat said, if we can keep the pages in the swapcache, we can avoid
>   making a new allocation and copying the page. We'd need to move it
>   back from zswap LRUs to the reclaim LRUs though.

Yes, Nhat and Shakeel have both offered the same perspective. I'm currently
working on this approach with help from Nhat. 

> - One advantage of keeping incompressible pages in zswap is preserving
>   LRU ordering. IOW, if some compressible pages go to zswap first (old),
>   then some incompressible pages (new), then the old compressible pages
>   should go to disk via writeback first. Otherwise, accessing the hotter
>   incompressible pages will be slower than accessing the colder
>   compressible pages. This happens today because incompressible pages go
>   straight to disk.
> 
>   The above will only materialize for a workload that has writeback
>   enabled and a mixture of both incompressible and compressible
>   workingset.

This makes sense to me. I'll take this into consideration when writing
benchmarks for this patch!

>   The other advantage, as you mention below, is preventing repeatedly
>   sending incompressible pages to zswap when writeback is disabled, but
>   that could be offset by the extra cost of allocations/copying.

Yes -- hopefully, keeping it in swapcache allows us to reap the benefits of
both worlds, minus the duplicated allocation / copying.

> - The above being said, we should not regress workloads that have
>   writeback disabled, so we either need to keep the pages in the
>   swapcache to avoid the extra allocations/copies -- or avoid storing
>   the pages in zswap completely if writeback is disabled. If writeback
>   is disabled and the page is incompressible, we could probably just put
>   it in the unevictable LRU because that's what it really is. We'd need
>   to make sure we remove it when it becomes compressible again. The
>   first approach is probably simpler.

This is a good point. While I don't have the numbers to back this up, I have
an intuition that this patch really sees the most benefits from reducing
CPU time spent trying to compress incompressible pages, rather than from
maintaining a better LRU ordering. For that reason I also suspect that we
see much better performance gains when writeback is disabled. Following that
logic... maybe there is some future work to be done that just moves these
incompressible pages to an unevictable LRU when writeback is disabled?

For now, I'm still experimenting with keeping the page in swapcache. I'll
be sure to report back with cool findings! Thank you again for your review
of this idea Yosry, I hope you have a great day!
Joshua