From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id B496EC4332F for ; Wed, 23 Nov 2022 08:03:18 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 499818E0001; Wed, 23 Nov 2022 03:03:18 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 421C46B0073; Wed, 23 Nov 2022 03:03:18 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 29BA08E0001; Wed, 23 Nov 2022 03:03:18 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 145AA6B0071 for ; Wed, 23 Nov 2022 03:03:18 -0500 (EST) Received: from smtpin13.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id CA8FB8022A for ; Wed, 23 Nov 2022 08:03:17 +0000 (UTC) X-FDA: 80163966834.13.66937BB Received: from mail-io1-f53.google.com (mail-io1-f53.google.com [209.85.166.53]) by imf07.hostedemail.com (Postfix) with ESMTP id 78B9E4000F for ; Wed, 23 Nov 2022 08:03:16 +0000 (UTC) Received: by mail-io1-f53.google.com with SMTP id i85so1375851ioa.5 for ; Wed, 23 Nov 2022 00:03:16 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=yrgVTuwsXkscA3+VtXLuTliGRixNsRKkWRsNwU7X2p4=; b=i1kX17R+uKgbofM7MLonIyalnpSR1toKzj7fUfY2r0c+PyQAjJleF3Hp+femsIhAMG P+BPIPmgRc5GKhNna8CjMc0j/lS7HE0SWw40wjOSvXB/zociltAux2JVtBDD8T1o5vUh DiIWJtoAWJPzrcJLTLS2IxCSsQ3BgsWiwVuCf0CVxnWJdYRELwo+TKg+bvC/D8rqi6uO bzD7gxkulxet9Giji3lG05t5ARHY94a/51vRtFDysLS1EqAsgyj7ENneP+e9KoyrnqJP iUwzp0ZC0iH1ENf5G0LVrtBXfM3fBpDYEn8InEVQjFGQcVj3MV1EX8SRRCvhiu0ZZp2l 6CPg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=yrgVTuwsXkscA3+VtXLuTliGRixNsRKkWRsNwU7X2p4=; b=xxjuSA9Jz52UzaBIFD1xJC4Xr4X6O/XumQcgjbOVXMaBjnbaSsVlOPSp18sqyIsXZh 1aMRmrb7zuf32/UDLRdFzHVoUsoSQZTt80ZWDVtd18OgXaSd0WVg6BagjKuyPAQQjQTo QOEIxyE5AZScFi7C+y/7fhIUy608xDhLk59YX5RGzrqajIrsNbzIjvzi/suK1EMUaBQw MknFQQtjnZPDgtItPMMeWSrB7VgI5dDGufhOaR7RpW8BlgTx+/mpTAXRwQA6+qgntryg iXP3MoFEjKGBSdE7MbcDSbOv72f8CrSRMM+hMADKZwuQvExvOd+cbqwR8bF4UOSc8wyu aTlQ== X-Gm-Message-State: ANoB5plEeDilmBr6nJSfA4PZwXcPT1NUkcJWJqWGObIPtQNIHrC4jzRr 1M8IUl9PgaJu6nfVsFp84NUyVSpINzIyRMuFO2tzkA== X-Google-Smtp-Source: AA0mqf4KtXPzkWjM8kMQLpT/zQZKPOQpLSaYzqyJyi1RigEWEUAjrJPxLzpOp7qX/h4OUfWABWFqQaU6NfBy8CdzekA= X-Received: by 2002:a02:9422:0:b0:373:2c18:a37e with SMTP id a31-20020a029422000000b003732c18a37emr12332210jai.51.1669190595486; Wed, 23 Nov 2022 00:03:15 -0800 (PST) MIME-Version: 1.0 References: <20221119001536.2086599-1-nphamcs@gmail.com> <20221119001536.2086599-5-nphamcs@gmail.com> In-Reply-To: From: Yosry Ahmed Date: Wed, 23 Nov 2022 00:02:39 -0800 Message-ID: Subject: Re: [PATCH v6 4/6] zsmalloc: Add a LRU to zs_pool to keep track of zspages in LRU order To: Sergey Senozhatsky Cc: Johannes Weiner , Nhat Pham , akpm@linux-foundation.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, minchan@kernel.org, ngupta@vflare.org, sjenning@redhat.com, ddstreet@ieee.org, vitaly.wool@konsulko.com Content-Type: text/plain; charset="UTF-8" ARC-Authentication-Results: i=1; imf07.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=i1kX17R+; spf=pass (imf07.hostedemail.com: domain of yosryahmed@google.com designates 209.85.166.53 as permitted sender) smtp.mailfrom=yosryahmed@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1669190596; a=rsa-sha256; cv=none; b=1YyAlp7JxgEfTYaSOO8bKCJyyeFfJgWHUoVr6vPQo3sQu23Q5F2qNJ/HD6mHd+Hx7TlGe8 KCoGe1qfpPyVCHzoAtTH94ulPaWQ4wd2I9GsGwLAxS99zWHsec7fNOiD/9qzLEj2xrfBZi 1h4y3cGka79n9YZ85M1V4mqt02prbkI= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1669190596; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=yrgVTuwsXkscA3+VtXLuTliGRixNsRKkWRsNwU7X2p4=; b=cMjl/z3gV/P3TcmvczdTNEFDAWqWjOjU4BBL7SLkcq0UVkA4O14K3Cv7hSuWVLurQ3z6m8 c0EVxaSPYCASN5NgF9roqJfGke+SxyaJ75SZT49GdKsCYYHa9PWvu0UQ1GHgLM605d+Rdz bE4bx2q25wYeod3IaCtbROrh+/fhwjE= X-Stat-Signature: xj1i4sj3c8i7smmxnmizcmsa3j7xw3tp X-Rspamd-Server: rspam01 X-Rspamd-Queue-Id: 78B9E4000F Authentication-Results: imf07.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=i1kX17R+; spf=pass (imf07.hostedemail.com: domain of yosryahmed@google.com designates 209.85.166.53 as permitted sender) smtp.mailfrom=yosryahmed@google.com; dmarc=pass (policy=reject) header.from=google.com X-Rspam-User: X-HE-Tag: 1669190596-626938 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Tue, Nov 22, 2022 at 7:50 PM Sergey Senozhatsky wrote: > > On (22/11/22 12:42), Johannes Weiner wrote: > > On Tue, Nov 22, 2022 at 10:52:58AM +0900, Sergey Senozhatsky wrote: > > > On (22/11/18 16:15), Nhat Pham wrote: > > > [..] > > > > @@ -1249,6 +1267,15 @@ void *zs_map_object(struct zs_pool *pool, unsigned long handle, > > > > obj_to_location(obj, &page, &obj_idx); > > > > zspage = get_zspage(page); > > > > > > > > +#ifdef CONFIG_ZPOOL > > > > + /* Move the zspage to front of pool's LRU */ > > > > + if (mm == ZS_MM_WO) { > > > > + if (!list_empty(&zspage->lru)) > > > > + list_del(&zspage->lru); > > > > + list_add(&zspage->lru, &pool->lru); > > > > + } > > > > +#endif > > > > > > Do we consider pages that were mapped for MM_RO/MM_RW as cold? > > > I wonder why, we use them, so technically they are not exactly > > > "least recently used". > > > > This is a swap LRU. Per definition there are no ongoing accesses to > > the memory while the page is swapped out that would make it "hot". > > Hmm. Not arguing, just trying to understand some things. > > There are no accesses to swapped out pages yes, but zspage holds multiple > objects, which are compressed swapped out pages in this particular case. > For example, zspage in class size 176 (bytes) can hold 93 objects per-zspage, > that is 93 compressed swapped out pages. Consider ZS_FULL zspages which > is at the tail of the LRU list. Suppose that we page-faulted 20 times and > read 20 objects from that zspage, IOW zspage has been in use 20 times very > recently, while writeback still considers it to be "not-used" and will > evict it. > > So if this works for you then I'm fine. But we probably, like you suggested, > can document a couple of things here - namely why WRITE access to zspage > counts as "zspage is in use" but READ access to the same zspage does not > count as "zspage is in use". > I guess the key here is that we have an LRU of zspages, when we really want an LRU of compressed objects. In some cases, we may end up reclaiming the wrong pages. Assuming we have 2 zspages, Z1 and Z2, and 4 physical pages that we compress over time, P1 -> P4. Let's assume P1 -> P4 get compressed in order (P4 is the hottest page), and they get assigned to zspages as follows: Z1: P1, P3 Z2: P2, P4 In this case, the zspages LRU would be Z2->Z1, because Z2 was touched last when we compressed P4. Now if we want to writeback, we will look at Z1, and we might end up reclaiming P3, depending on the order the pages are stored in. A worst case scenario of this would be if we have a large number of pages, maybe 1000, P1->P1000 (where P1000 is the hottest), and they all go into Z1 and Z2 in this way: Z1: P1 -> P499, P1000 Z2: P500 -> P999 In this case, Z1 contains 499 cold pages, but it got P1000 at the end which caused us to put it on the front of the LRU. Now writeback will consistently use Z2. This is bad. Now I have no idea how practical this is, but it seems fairly random, based on the compression size of pages and access patterns. Does this mean we should move zspages to the front of the LRU when we writeback from them? No, I wouldn't say so. The same exact scenario can happen because of this. Imagine the following assignment of the 1000 pages: Z1: P (P1, P3, .., P999) Z2: P (P2, P4, .., P1000) Z2 is at the front of the LRU because it has P1000, so the first time we do writeback we will start at Z1. Once we reclaim one object from Z1, we will start writeback from Z2 next time, and we will keep alternating. Now if we are really unlucky, we can end up reclaiming in this order P999, P1000, P997, P998, ... . So yeah I don't think putting zspages in the front of the LRU when we writeback is the answer. I would even say it's completely orthogonal to the problem, because writing back an object from the zspage at the end of the LRU gives us 0 information about the state of other objects on the same zspage. Ideally, we would have an LRU of objects instead, but this would be very complicated with the current form of writeback. It would be much easier if we have an LRU for zswap entries instead, which is something I am looking into, and is a much bigger surgery, and should be separate from this work. Today zswap inverts LRU priorities anyway by sending hot pages to the swapfile when zswap is full, when colder pages are in zswap, so I wouldn't really worry about this now :)